The Cholesky Approach: A Cautionary Note

Similar documents
Accommodation of genotypeenvironment

Chapter 4: Factor Analysis

Variance Component Models for Quantitative Traits. Biostatistics 666

Sex-limited expression of genetic or environmental

Genetic simplex model in the classical twin design. Conor Dolan & Sanja Franic. Boulder Workshop 2016

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

Key Algebraic Results in Linear Regression

Phenotypic factor analysis

NIH Public Access Author Manuscript Behav Genet. Author manuscript; available in PMC 2009 October 6.

Models with multiple random effects: Repeated Measures and Maternal effects

Gene± environment interaction is likely to be a common and

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Model II (or random effects) one-way ANOVA:

Simultaneous Genetic Analysis of Longitudinal Means and Covariance Structure in the Simplex Model Using Twin Data

Continuously moderated effects of A,C, and E in the twin design

Multiple random effects. Often there are several vectors of random effects. Covariance structure

Continuously moderated effects of A,C, and E in the twin design

Simultaneous Genetic Analysis of Means and Covariance Structure: Pearson-Lawley Selection Rules

A User's Guide To Principal Components

Partitioning Genetic Variance

VALUES FOR THE CUMULATIVE DISTRIBUTION FUNCTION OF THE STANDARD MULTIVARIATE NORMAL DISTRIBUTION. Carol Lindee

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Time Series Analysis for Macroeconomics and Finance

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Applied Multivariate Analysis

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

Lecture 6: Selection on Multiple Traits

Lecture 7 Correlated Characters

Biometrical Genetics. Lindon Eaves, VIPBG Richmond. Boulder CO, 2012

Modeling Log Data from an Intelligent Tutor Experiment

Multivariate Distributions

Stock Prices, News, and Economic Fluctuations: Comment

Multivariate Analysis in The Human Services

Factor Analysis. Qian-Li Xue

3/10/03 Gregory Carey Cholesky Problems - 1. Cholesky Problems

Introduction to Multivariate Genetic Analysis. Meike Bartels, Hermine Maes, Elizabeth Prom-Wormley and Michel Nivard

Exploring Cultural Differences with Structural Equation Modelling

MIXED MODELS THE GENERAL MIXED MODEL

Preface. List of examples

Measuring relationships among multiple responses

Notes on Twin Models

A Rejoinder to Mackintosh and some Remarks on the. Concept of General Intelligence

The 3 Indeterminacies of Common Factor Analysis

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Introduction to Factor Analysis

SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS. Donna Mohr and Yong Xu. University of North Florida

To cite this article: Edward E. Roskam & Jules Ellis (1992) Reaction to Other Commentaries, Multivariate Behavioral Research, 27:2,

The Specification of Causal Models with Tetrad IV: A Review

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

STAT 730 Chapter 9: Factor analysis

Feb 21 and 25: Local weighted least squares: Quadratic loess smoother

SUPPLEMENTARY INFORMATION

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35

Consequences of measurement error. Psychology 588: Covariance structure and factor models

CORRELATIONS ~ PARTIAL REGRESSION COEFFICIENTS (GROWTH STUDY PAPER #29) and. Charles E. Werts

Basic Sampling Methods

Combining SEM & GREML in OpenMx. Rob Kirkpatrick 3/11/16

Lagrangian Description for Particle Interpretations of Quantum Mechanics Single-Particle Case

Multivariate Statistics

=, v T =(e f ) e f B =

Supplementary File 3: Tutorial for ASReml-R. Tutorial 1 (ASReml-R) - Estimating the heritability of birth weight

SEM with observed variables: parameterization and identification. Psychology 588: Covariance structure and factor models

4. Path Analysis. In the diagram: The technique of path analysis is originated by (American) geneticist Sewell Wright in early 1920.

1 Teaching notes on structural VARs.

An Introduction to Path Analysis

Intermediate Social Statistics

Online Appendices for: Modeling Latent Growth With Multiple Indicators: A Comparison of Three Approaches

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models

Systematic error, of course, can produce either an upward or downward bias.

WHAT IS STRUCTURAL EQUATION MODELING (SEM)?

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Factor Analysis (10/2/13)

Principal Components Analysis (PCA)

Matrix Algebra. Matrix Algebra. Chapter 8 - S&B

A Cautionary Note on the Use of LISREL s Automatic Start Values in Confirmatory Factor Analysis Studies R. L. Brown University of Wisconsin

Structural equation modeling

UCLA Department of Statistics Papers

Multivariate Analysis. Hermine Maes TC19 March 2006

APPLIED STRUCTURAL EQUATION MODELLING FOR RESEARCHERS AND PRACTITIONERS. Using R and Stata for Behavioural Research

1 Description of variables

An Introduction to Mplus and Path Analysis

R = µ + Bf Arbitrage Pricing Model, APM

Matrix Factorizations

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

Some Multivariate techniques Principal components analysis (PCA) Factor analysis (FA) Structural equation models (SEM) Applications: Personality

Notes on Time Series Modeling

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Measurement exchangeability and normal one-factor models

New trends in Inductive Developmental Systems Theory: Ergodicity, Idiographic Filtering and Alternative Specifications of Measurement Equivalence

The Instability of Correlations: Measurement and the Implications for Market Risk

Correlations with Categorical Data

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

Volume Title: The Interpolation of Time Series by Related Series. Volume URL:

Residuals in Time Series Models

RANDOM INTERCEPT ITEM FACTOR ANALYSIS. IE Working Paper MK8-102-I 02 / 04 / Alberto Maydeu Olivares

Latent random variables

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Dimensionality Assessment: Additional Methods

Transcription:

Behavior Genetics, Vol. 26, No. 1, 1996 The Cholesky Approach: A Cautionary Note John C. Loehlin I Received 18 Jan. 1995--Final 27 July 1995 Attention is called to a common misinterpretation of a bivariate Cholesky analysis as if it were a common and specific factor analysis. It is suggested that an initial Cholesky behavior genetic analysis should often be transformed into a different form for interpretation. Formulas are provided for four transformations in the bivariate case. KEY WORDS: Cholesky; triangular decomposition; simplex; multivariate behavior-genetic analysis. Cholesky factoring, or triangular decomposition, is becoming a popular approach to multivariate behavior genetic problems (see, e.g., Neale and Cardon, 1992). In such a procedure, illustrated in Fig. 1 for three variables, the first latent variable, F~, has effects on all the variables V~ to V3; the second, F2, is uncorrelated with the first and has effects on the remaining variables V2 and V3; and the last, F3, is specific to V 3. One use of the Cholesky procedure is in temporal contexts. For example, Vt to V 3 might represent measurements of some variable at three successive times. In this case, F~ would represent causes present at time 1 which affect the observed variable at time 1 and on subsequent occasions; F2 would represent additional causes which arise by time 2 and whose effects are added to those of F1 from time 2 on; and, finally, F3 represents new causes at time 3 which affect only the last measurement, V 3. However, a Cholesky decomposition can also represent a multivariate analysis of simultaneously measured variables considered in some rationally defined order of priority. In this case, F 1 is assigned the first priority, to explain V t and as much of V 2 and /I3 as it can. Then Fz, given second priority, explains what is left of V 2 and as much as it can of V 3. Finally, F 3 takes care of what is left of V 3. 1Department of Psychology, University of Texas, Austin, Texas 78712. E-mail: loehlin@psy.utexas,edu. \ \,) Fig. 1. Example of Cholesky decomposition. F~, F2, and F3, Cholesky factors; Vl, V~, and V3, variables. It should be emphasized that the explanation arrived at depends on the ordering--if we had considered the latent variables in the reverse order, F 3 would be a factor with paths to all the variables, and F~ a residual. Only in the case of uncorrelated variables is the order of selection immaterial. Any of the six possible orderings of the three latent variables in Fig. 1 explains the variance--covariance matrix among the variables equally well. Any solution can be transformed into any other at will. The justification of one over another as an explanation of the data depends entirely on the logic underlying the sequence in which the variables are considered. In the temporal case, there is 65 0001-8244/96/0100-0065509,50/0 9 1996 Plenum Publishing Corporation

66 Loehlin a compelling underlying ordering, that of time sequence. In the general multivariate case, the ordering must be rationally justified. In many situations, rather than undertake such a justification, a user might prefer to transform an initial Cholesky solution into some alternative form that does not make this demand, along lines to be discussed shortly. In typical behavior-genetic applications, each of the latent variables in a Cholesky analysis is decomposed into genetic and environmental components by obtaining data on the observed variables in an appropriate sample of twins, adoptees, or other relatives. An example of such a decomposition is given in Fig. 2, based on the familiar A, C, and E model of additive effects, common environmental effects, and unique environment plus error (e.g., Heath et al., 1989). Other variations, such as solving for D instead of C, dividing the C or E term into more than one component, and so on, would follow the same principles. A common misinterpretation of Cholesky analyses in the bivariate case is illustrated by Fig. 3. (For simplicity, all latent and observed variables are assumed to be in standard-score form.) The analysis on the left, labeled (i), is a Cholesky analysis, with variable A taken as primary. Ft is the sole cause of variable A and a partial cause of B. F 2 is a residual, representing the variance in B independent of F1 and A. The analysis on the right, (ii), is a common and specific factor analysis, with F c a factor affecting both A and B, and F A and FB factors specific to A and B, respectively. The error is to speak of analysis (i) as though it were analysis (ii). An example would be to take a two-occasion measurement analyzed by a Cholesky model and describe the genetic component of F~ as representing common genetic influences or genetic influences affecting the phenotype at both ages (e.g., Plomin et al., 1994, p. 209). This latent variable does include such influences, so the statement is not entirely incorrect, but it also includes genetic influences that are not common to both ages but specific to time 1. Another example (Dully et al., 1994) involves two variables measured concurrently. Two indices of Type A personality were analyzed via a bivariate Cholesky analysis. The variance corresponding to A and E components of F~ is described as "common additive genetic" and "common unshared environment"---despite the likely presence of a specific component in the for- Fig. 2. Cholesky factors of Fig. 1 decomposed into additive genetic (A), common environmental (C), and unshared environmental (E) components. (i) (~i).77 6O tab =.64 tab --.64 Fig. 3. Bivariate Cholesky model (i) compared to common and specific factors model (ii). A and B, observed variables. F~ and F2, Cholesky factors. F o common factor; FA and FB, specific factors. mer, and the almost certain presence of one in the latter. In still other cases of Cholesky analyses, although the description of the analysis is accurate, a full justification of the ordering of the variables is skimpy or absent (e.g., Truett et al., 1992). In cases such as the above, the straightforward separation of influences into those that are shared and those that are specific would require a model 'of type (ii) rather than that of type (i). Type (ii) models are not generally solvable in the bivariate case, unless additional restrictions are imposed. On possibility is to require the paths from Fc to A and B to be equal. An example is provided by Schmitz et al. (1994). A model of type (ii) was fit to data on early childhood Internalizing and Externalizing behaviors by equating the paths to the two variables from the common A, C, and E factors. (Note that such a procedure normally makes sense only if the two variables are measured on comparable scales or are standardized.) With three or more variables,

The Cholesky Approach 67 (i) (ii) total A residual B specific A common specific B (iii) (iv) total A,o,o,A 0c b E (v) residual A Fig. 4. Five alternative bivariate behavior genetic models: (i) Cholesky, (ii) common and specific factors, (iii) correlated factors, (iv) simplex, and (v) opposite Cholesky. In squares, A and B observed variables. In circles, A, C, and E, additive genetic, common environmental, and unshared environmental latent variables. Correlations: ra, rc, and re, additive genetic, common environmental, and unshared environmental correlations between corresponding components of A and B. Paths: ha, CA, ea, hb, C8, and eb, additive genetic, common environmental, and unshared environmental paths corresponding to univariate analyses of A and B; x, y, z, u, v, w, o, p, and q, residuals; ba, bc, and be, paths between additive genetic, common, environmental, and unshared environmental latent variables at times A and B. Remaining path labels arbitrary, for use in Table I equations. model (ii) becomes the standard Spearman model of a general plus specific factors and, as such, is solvable except in special cases. Some recent behavior-genetic examples include Baker et al. (1991), Buhrich et al. (1991), and Petrill and Thompson (1993). It would be easy to assume that at least the residuals for the second variable in type (i) and type (ii) models are the same, but they are not. Figure 3 contains numerical values derived from an assumed observed correlation of 0.64 between A and B. Note that the specific contribution to B differs depending on which model is being considered. If, as in (ii), a common factor equally correlated with A and B is assumed, 64% of B's variance is explained by it, and 36% is residual. But if model (i) is assumed, only 41% of B's variance is explained by F~ and 59% is residual. Thus even an analysis of just the second variable into a part shared with the rfirst and a part that is specific (cf. Duffy et al., 1994, p. 473) can be misleading, if based on a Cholesky analysis. As always in latent

68 Loehlin Table 1. Transforming Bivariate Cholesky (i) to Other Models in Fig. 4 (All Latent and Observed Variables Assumed Standardized) Model (ii), common and specific Model (iii), correlated factors Model (iv), simplex Model (v), opposite Cholesky m = n = ca = "v~ 2 + yb 2 Ca same as (iii) cb ea = ",/k-'tt za 2 ea eb same as (iii) ua = VhA ~ - F v A = ~CA 2 -- m 2 W A -~- "t~/ea2 -- n 2 ra = VhB ba = #ha r= iha/h B rc =j/cb bc =flcb s =jca/cb re = k/eb be = k/eb t= kea/e s u a = X/i 2 +:ca 2 -- F va = -~/j2 + ya2 _ m 2 w a = ~/k 2+ zs 2 -- n 2 ca same as (i) CA same as (i) YA = ~/~A 2 -- S2 e A ea ZA = e~a 2 -- t 2 o = ",/1 - ba ~ p = X/I - bc 2 q = X/1 - bl~ 2 variable modeling, care must be taken that the language used in the interpretation corresponds to the analysis that was actually carried out. One possible way to proceed in many cases is to obtain an initial Cholesky solution and transform it into the desired form for interpretation. Alternatively, one could fit the desired solution directly, but the Cholesky is easy to program and solve, and provides a reasonable starting point if one wants to look at several alternatives. [Some mathematical virtues of the Cholesky solution are discussed by Neale and Cardon (1992).] Figure 4 represents five solutions for the twovariable case. Models (i) and (ii) are the A,C,E versions of the diagrams in Fig. 3--the Cholesky and a common and specific factor model. Model (iii) is what Neale and Cardon (1992, p. 270) call the "correlated factors" model. In this, each variable is separately decomposed into its genetic and environmental components, and the correlations of these across variables are estimated. Thus, for example, variables A and B might each be highly heritable but be influenced by different genes (ra = 0). At the same time, it might be that the few environmental events that do affect A and B tend to influence both--whether these events are shared by family members or unique to an individual (rc and r E large). Model (iv) is a simplex (e.g., Boomsma and Molenaar, 1987), in which causes present at time A partially persist until time B (the regressions ba, etc.), at which time new causes may enter (the residuals). Finally, model (v) is a second Cholesky analysis taking trait B as primary--one might sometimes want to consider how different the effects are of making the two extreme assumptions about the causal priority of A and B. Table I gives simple formulas for transforming the bivariate Cholesky solution in (i) into any of the other four. Those for the correlated factors solution (iii) are given in a slightly different form by Neale and Cardon (1992, p. 272). It should be emphasized that these are all alternative ways of looking at the same facts. All produce exactly the same implied variance-covariance matrix and, hence, the identical overall goodness of fit. However, they represent different causal models of what is going on, and the differences in the values of their paths reflects this fact. Unless a strict logical priority holds between variable A and variable B, models (ii) and (iii) are likely to be the most readily interpretable. Model (i) should remain useful for dealing with a single variable measured on two occasions, although (iv), (ii), and (iii) can also be employed in this context-- the last as reflecting genetic and environmental correlations over time (e.g., Plomin and DeFries, 1981). It should be noted that a given transformation may not always be possible. The application

The Cholesky Approach 69 of a formula can sometimes lead to impossible resuits, such as a negative square root or a negative residual variance. This simply means that a model of the desired form will not do the job. However, a variant of the solution constrained to avoid the undesirable feature may still be acceptable. This is probably best achieved by fitting the desired variant directly, with the necessary constraints imposed. It should also be noted that simple transformational formulas of the present kind do not necessarily generalize easily to larger numbers of variables, although the interpretational issues apply to such cases as well. In summary, the approach to multivariate behavior-genetic analysis via Cholesky decomposition can be a useful first step, but one that should be interpreted with care and, in many cases, not left as a final solution. One can make an analogy to the use of principal components in factor analysis: principal components are computationally convenient and have many attractive mathematical properties, but they are usually not interpreted without further transformation. In fact, it is of some interest that the Cholesky procedure was itself at one time in use in factor analysis as a method of initial factoring, under the rubric of the diagonal method (Harman, 1976, p. 101). The further development and use of rotational or other transformational procedures for following up initial solutions in behavior genetic analysis would seem to be a worthwhile goal, as pointed out some years ago by Crawford and DeFries (1978). ACKNOWLEDGMENTS I appreciate the helpful comments of two anonymous referees. REFERENCES Baker, L. A., Vernon, P. A., and Ho, H.-Z. (1991). The genetic correlation between intelligence and speed of information processing. Behav. Genet. 21:351-367. Boomsma, D. I., and Molenaar, P. C. M. (1987). The genetic analysis of repeated measures. I. Simplex models. Behav. Genet. 17:111-123. Buhrich, N., Bailey, J. M., and Martin, N. G. (1991). Sexual orientation, sexual identity, and sex-dimorphic behaviors in male twins. Behav. Genet. 21:75-96. Crawford, C. B., and DeFries, J. C. (1978). Factor analysis of genetic and environmental correlation matrices. Multivar. Behav. Res. 13:297-318. Dully, D. L., Manicavasagar, V., O'Connell, D., Silove, D., Tennant, C., and Langelludecke, P. (1994). Type A personality in Australian twins. Behav. Genet. 24:469-475. Harman, H. H. (1976). Modern Factor Analysis, 3rd ed., University of Chicago Press, Chicago. Heath, A. C., Neale, M. C., Hewitt, J. K., Eaves, L. J., and Fulker, D. W. (1989). Testing structural equation models for twin data using LISREL. Behav. Genet. 19:9-35. Neale, M. C., and Cardon, L. R. (1992). Methodology for Genetic Studies of Twins and Families, Kluwer, Dordrecht, The Netherlands. Petrill, S. A., and Thompson, L. A. (1993). The phenotypic and genetic relationships among measures of cognitive ability, temperament, and scholastic achievement. Behav. Genet. 23:511-518. Plomin, R., and DeFries, J. C. (1981). Multivariate behavioral genetics and development: Twin studies. In Gedda, L., Parisi, P., and Nance, W. E. (eds.), Twin Research 3. Part B: Intelligence, Personality, and Development, Liss, New York, pp. 25-33. Plomin, R. Pedersen, N. L., Lichtenstein, P., and McClearn, G. E. (1994). Variability and stability in cognitive abilities are largely genetic later in life. Behav. Genet. 24:207-215. Schmitz, S., Cherny, S. S., Fulker, D. W., and Mrazek, D. A. (1994). Genetic and environmental influences on early childhood behavior. Behav. Genet. 24:25-34. Truett, K. R., Eaves, L. J., Meyer, J. M., Heath, A. C., and Martin, N. G. (1992). Religion and education as mediators of attitudes: A multivariate analysis. Behav. Genet. 22: 43-62. Edited by David W. Fulker