Model Estimation Example

Similar documents
Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Investigating Models with Two or Three Categories

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Introducing Generalized Linear Models: Logistic Regression

Review of Multiple Regression

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012

An Introduction to Path Analysis

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Generalized Models: Part 1

An Introduction to Mplus and Path Analysis

Generalized Linear Models for Non-Normal Data

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Hypothesis Testing for Var-Cov Components

Introduction to Generalized Models

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multinomial Logistic Regression Models

Class Notes. Examining Repeated Measures Data on Individuals

Chapter 1 Statistical Inference

LOGISTIC REGRESSION Joseph M. Hilbe

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Binary Logistic Regression

8 Nominal and Ordinal Logistic Regression

Testing and Model Selection

Logistic Regression: Regression with a Binary Dependent Variable

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Statistical Distribution Assumptions of General Linear Models

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Phd Program in Transportation. Transport Demand Modeling. Session 8

Generalized Linear Models 1

Psychology 454: Latent Variable Modeling How do you know if a model works?

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

13.1 Categorical Data and the Multinomial Experiment

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Step 2: Select Analyze, Mixed Models, and Linear.

Generalized linear models

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Models for Binary Outcomes

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Linear Regression Models P8111

Psychology 282 Lecture #4 Outline Inferences in SLR

Logistic Regression. Continued Psy 524 Ainsworth

9 Generalized Linear Models

Frequency Distribution Cross-Tabulation

Testing Independence

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Stat 5101 Lecture Notes

Single-level Models for Binary Responses

Statistics 3858 : Contingency Tables

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

More Accurately Analyze Complex Relationships

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Random Intercept Models

Longitudinal Modeling with Logistic Regression

Can you tell the relationship between students SAT scores and their college grades?

Part 8: GLMs and Hierarchical LMs and GLMs

Model fit evaluation in multilevel structural equation models

Introduction to Within-Person Analysis and RM ANOVA

UNIVERSITY OF TORONTO Faculty of Arts and Science

Generalized Linear Models (GLZ)

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Generalized linear models

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

11. Generalized Linear Models: An Introduction

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

CHAPTER 1: BINARY LOGIT MODEL

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Math 423/533: The Main Theoretical Topics

A Threshold-Free Approach to the Study of the Structure of Binary Data

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Survival Analysis I (CHL5209H)

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Ordinary Least Squares Regression Explained: Vartanian

Classification. Chapter Introduction. 6.2 The Bayes classifier

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

1. BINARY LOGISTIC REGRESSION

A Re-Introduction to General Linear Models

Advanced Quantitative Data Analysis

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Correlation and Regression

Log-linear Models for Contingency Tables

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Lecture 12: Effect modification, and confounding in logistic regression

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Review of Statistics 101

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

Transcription:

Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions of various model estimation methods come up regularly in factor analysis, structural equation models, mixed (or multilevel) models, and generalized linear models (i.e., which are models for dichotomous, ordinal, multinomial, and count outcomes). Model estimation attempts to determine the extent to which a model-implied covariance (or correlation) matrix is a good approximation of the sample covariance matrix. In general, confirmation of a proposed model relies on the retention of the null hypothesis that is, that the data are consistent with the model hypothesized (Marcoulides & Hershberger, 1997). Failure to reject this null hypothesis implies that the proposed model is a plausible representation of the data, although it is important to note that it may not be the only plausible representation of the data. As Marcoulides and Hershberger (1997) note, evaluating the difference between the two covariance matrices based on the proposed model depends on the estimation method used to solve for the model s parameters [e.g., generalized least squares (GLS), maximum likelihood (ML), weighted least squares (WLS)]. Each approach proceeds iteratively to solve the modelimplied equations until an optimal solution for the model parameters is obtained (i.e., where the implied covariance matrix is close to the observed covariance matrix). The difference in the model-implied matrix and sample matrix is described as a discrepancy function, that is, a way of weighting the differences between the observed ( S ) and model-implied ( Ŝ ) covariance matrices. In matrix terms, we can define this as F = ( s sˆ) W( s sˆ) (1) ij where s and ŝ are nonduplicated elements of the observed and implied covariance matrices S and Ŝ and arranged as vectors. The goal of the analysis is to minimize this function by taking partial derivatives of it by the model parameters with respect to the elements of the two covariance matrices S Ŝ. So, for example, if we have a 3 x 3 covariance matrix, the lower part of the matrix would become a sixelement vector (3 variances and 3 covariances), and ( s s ˆ) would contain the differences between the elements in the two covariance matrices (Loehlin, 1992). The exact form of the discrepancy function is different for each estimation method, and each can have its own set of advantages and disadvantages. In Eq. 1 above, W is a weight matrix and different versions of it (i.e., ML, GLS, WLS) will yield different criteria for weighting the differences between the corresponding elements in the observed and implied covariance matrices. If W in Eq. 1 is an identity (I) matrix (which has 1s as the diagonal elements and 0s as the offdiagonal elements), the expression reduces to the following: ( s sˆ)( s sˆ). This is actually just the sum of the squared differences in the elements in the observed and implied covariance

Ronald H. Heck Model Estimation Example 2 matrices, which happens to be the ordinary least squares (OLS) criterion. Unweighted least squares (ULS) estimation is the same as OLS, in that the weight matrix (i.e., the weighted sum of the differences) is also just an identity matrix. Loehlin talks about squaring them because the function above essentially means the product of the two deviations. If the two matrices are identical, the value of the expression will be 0. The greater the difference between the two matrices, the squared differences in their elements will increase. The sum of these is the discrepancy function (F). The larger the discrepancy function becomes the worse the fit, which implies less similarity between the elements in the two matrices. Model estimation involves trying to minimize F by seeking values of the unknown model parameters that make the implied covariance as much like the observed covariance matrix as possible (Loehlin, 1992). For OLS, for example, this is often when the metrics of the variables in the covariance matrix are measured on the same type of scale of measurement. In comparison to OLS (or ULS) estimation, GLS, ML and WLS require considerably more computation. As Loehlin (1992) notes, for variables that are normally distributed (or relatively so), Eq. 1 reduces to the following: 1/2 [( - ˆ) ] 2 tr S S V, (2) where tr is the trace of the matrix (i.e., the sum of the diagonal elements) and V is another weight matrix. This formulation helps clarify the differences between ULS, GLS and ML estimation. As noted for ULS, the weight matrix V is defined as identify (V = I). For GLS, it is the inverse of the 1 sample covariance matrix (V = S ), and for ML it is defied as the inverse of the model-implied 1 covariance matrix (V = Ŝ ). Because the ML discrepancy function uses the inverse of the model-implied covariance matrix Ŝ, which has to be recalculated at each iteration, this makes ML estimation more challenging under certain conditions. It should be noted that ML is typically defined somewhat differently from Eq. 2 (Loehlin, 1992): ˆ 1 F = tr( SS ) p+ ln( Sˆ ) ln( S ), (3) ML that is, the discrepancy function is defined in terms of the trace (i.e., sum of the diagonal elements in the matrix) of the products of the sample and model-implied covariance matrices and the natural logarithms of the determinants of the model-implied and sample covariance matrices, given the number of variables (p) in the matrix. This leads to similar minimizing of the discrepancy function (Loehlin, 1992); however, it is often advantageous to work with logarithms, which often can facilitate solving the discrepancy function more easily. Note that in each of these cases, we are assuming that only the covariance matrices are being estimated (modeling mean structures simply requires additional terms added to each discrepancy function). As this discussion suggests, each general approach to model estimation rests on a somewhat different set of assumptions and statistical theory underlying the estimation of various kinds of models. Since GLS uses the inverse of the S covariance matrix as the weight matrix, an advantage is that it only needs to be calculated once, since the observed covariance matrix S does not change (i.e., it has been described therefore as ML estimation with a single iteration). As noted, ML depends on the model-implied covariance matrix and, therefore, typically more complex calculations (with multivariate normality and large sample sizes, however, GLS and

Ronald H. Heck Model Estimation Example 3 ML will produce very similar estimates). In cases where the outcomes are categorical (e.g., dichotomous, ordinal), estimation is considerably more complex than OLS regression models for continuous outcomes, since they depend on estimating probability relationships which follow sampling distributions other than the normal distribution. Such models (referred to as generalized linear models) therefore require iterative techniques such as ML to solve the implied set of relationships. GLS and ML can be used to derive a chi-square fit index through the calculation of: ( N 1) F 2 χ = min, (4) F where min is the value of the discrepancy function at the point of best fit and N is the sample size. As you have likely encountered, however, this model fit index is not always favored because of its reliance on sample size, which can lead to rejecting relatively good-fitting models in larger samples. Empirical work suggests ML estimation will work pretty well with skewness of +/-2 and kurtosis of +/-7 (West et al., 1995). WLS can provide this too, and it does not depend on multivariate normality (i.e., it is often used with ordinal types of outcomes in SEM). However, WLS is based on the variances and covariances among the vector elements (s) with the observed covariance matrix S. So as the original covariance matrix S gets larger, the vector s of its nonduplicated elements increases rapidly in length, and then the weight matrix, whose size is the square of the length of that vector, can become quite large and demanding in terms of calculation of model parameters. Therefore, WLS typically requires much larger sample sizes than ML and GLS estimation. For ML and GLS, model convergence problems certainly increase in samples of 100 or less (and fewer than 3 indicators per factor in factor models). Heywood cases in factor models are also very likely to occur under those sorts of conditions. Where one has 150-200 cases and at least 3 indicators per factors, convergence becomes less of a problem. ML Estimation ML estimation is probably most often used to estimate various types of models with interval and categorical outcomes, but it does depend on relatively large sample sizes (we can use restricted maximum likelihood in small samples). ML estimation determines the optimal population values for parameters in a model that reduces the discrepancy between the observed and implied matrices, given the current parameter estimates (Hox, 2010). As noted, in ML estimation the discrepancy function is defined in terms of a likelihood function (or likelihood) that the model with a particular set of estimates could have produced the observed covariance matrix. In many cases (since functions may be exponential in nature), it is more convenient to work in terms of the natural logarithm of the likelihood function, called the log-likelihood. One advantage of the log-likelihood is that the terms are additive (instead of products). Because the likelihood of the data can vary from 0.0 to 1.0, rather than maximizing the likelihood function, ML uses a more conceptually convenient function that is inversely related to the likelihood function (described previously), such that the smaller this discrepancy function is, the greater the likelihood that the model with a particular set of parameter estimates could have produced the sample covariance matrix (S). The value will be 0 if the model fits the data perfectly (i.e., the natural log of 1 = 0).

Ronald H. Heck Model Estimation Example 4 Note also that the log-likelihood function is in the negative quadrant because of the logarithm of a number between 0 and 1 is negative (e.g., the natural log of 0.2 is -1.61). Estimating the parameters involves making a series of iterative guesses that determines an optimal set of weights for random parameters in the model that minimizes the natural logarithm multiplied by the likelihood of the data. Arriving at a set of final estimates is known as model convergence (i.e., where the estimates no longer change and the likelihood is therefore at its maximum value). It is important that the model actually reaches convergence, as the resulting parameter estimates will not be trustworthy if it has not. Sometimes increasing the number of iterations will result in a model that converges, but often, the failure of the model to converge on a unique solution is an indication that it needs to be changed and re-estimated. Keep in mind that even if a model converges, it does not mean the estimates are the right ones, given the sample data. In the same way, we would not conclude that because we fail to reject a model as consistent with the observed data, that it is the only model that would fit this criterion. For models with categorical outcomes, the likelihood function is a little different from models with continuous outcomes (owning to their different sampling distributions), but the principle of model estimation is the same. In this latter case, ML estimation often employs Fisher scoring, which uses a likelihood function that captures the probability of the observed data that would be obtained over a range of parameter values. For Poisson or binomial distributions this algorithm is simplified to the Newton-Rapson procedure (Azen & Walker, 2011). Both algorithms proceed through making an initial guess for all the model parameters and then adjusting the guess by a second set of model parameters that is adjusted to increase the likelihood function. This is repeated until the estimates no longer change and the iteration process has converged on the values of the final ML estimates (Azen & Walker, 2011). ML estimation produces a model deviance statistic (which is often referred to as -2LL or -2*log likelihood), which is an indicator of how well the model fits the data. We multiply the log likelihood by -2 so it can be expressed easily as a positive number. Models with lower deviance (i.e., a smaller discrepancy function) fit the data better than models with larger deviance. Once we have a solution that converges, we can assess how well the proposed model fits the data using various model fit indices. We can also look at the residuals (or residual matrix) that describes the difference between the model-implied covariance matrix and actual covariance matrix. Large residuals imply that some aspects of the proposed model do not fit the data well. An Example Using an Ordinal Outcome Let s say we wish to estimate a model where the outcome is ordinal and there are two predictors (score on a math test and gender). We will use GENLIN in IBM SPSS since we can easily print relevant information about the model estimation procedures. Model 1: Threshold Only Model (no predictors) We first estimate a baseline model with no predictors. Below (Table 1) we have information about the type of model (the probability model is multinomial, which is appropriate for ordinal outcomes) and the link function (because the outcome is not continuous) is the cumulative logit.

Ronald H. Heck Model Estimation Example 5 Table 1. Model Information Dependent Variable courses a Probability Distribution Multinomial Link Function Cumulative logit a. The procedure applies the cumulative link function to the dependent variable values in ascending order. Below (Table 2) we have the distribution of perceptions about taking additional math courses past Algebra I, which shows 45% of students perceived they would not take any further math classes beyond Algebra I (45%), about 38.5% perceived they would take one additional course, and another 15.3% perceived they would take two additional courses. We can also see that only about 1.3% perceived they would take 3-4 additional courses beyond Algebra I. Table 2. Categorical Variable Information N Dependent Variable courses Percent 0 3901 45.0% 1 3335 38.5% 2 1323 15.3% 3 28 0.3% 4 83 1.0% Total 8670 100.0% We first estimate a model with just the thresholds (i.e., the intercepts). We can see in Table 3 that at the first iteration, we have the initial log likelihood estimate. Because the likelihood, or probability, of the data can vary from 0.0 to 1.0, it is common to take the log of it. The log likelihood in the table below is interpreted as the negative natural log of the likelihood function. The log of 1 is 0 (which would indicate no discrepancy) so, for example, if the initial log likelihood is approximately -9335.361415, that would be an initial likelihood function that is quite small (like just above 0), which suggests that the current model does not fit the data very well. As we add variables the log likelihood is reduced (i.e., closer to 0), which amounts to reducing the discrepancy function (or maximizing the likelihood that the proposed model accounted for the observed data). Table 3. Iteration History Iteration Update Type Number of Step-halvings Log Likelihood a Parameter Threshold (Scale) [courses=0] [courses=1] [courses=2] [courses=3] 0 Initial 0-9335.361514 -.200904 1.618601 4.345208 4.639164 1 1 Scoring 0-9335.361514 -.200904 1.618601 4.345208 4.639164 1 a. The kernel of the log likelihood function is displayed.

Ronald H. Heck Model Estimation Example 6 Below we can examine various fit criteria. Table 4. Goodness of Fit Value df Value/df Deviance.000 0. Scaled Deviance.000 0 Pearson Chi-.000 0. Scaled Pearson Chi-.000 0 Log Likelihood a -9335.362 Akaike's Information Criterion (AIC) 18678.723 Finite Sample Corrected AIC (AICC) 18678.728 Bayesian Information Criterion (BIC) 18706.994 Consistent AIC (CAIC) 18710.994 a. The kernel of the log likelihood function is displayed and used in computing information criteria. Here are the thresholds between the various categories of the outcome variable. Table 5. Parameter Estimates Parameter B Std. Error Hypothesis Test Wald Chi- df Sig. [courses=0] -.201.0216 86.608 1.000 Threshold [courses=1] 1.619.0289 3135.510 1.000 [courses=2] 4.345.0955 2068.941 1.000 [courses=3] 4.639.1103 1769.212 1.000 (Scale) a. Fixed at the displayed value. Model 2: Adding Two Predictors (test score and gender) 1 a Of greater interest is what happens when we add predictors to the model. Our assumption is that adding gender and previous test performance will reduce the size of the log likelihood function. Table 6. Continuous Variable Information N Minimum Maximum Mean Std. Deviation Covariate test1 8670 24.35 99.99 48.6323 9.71254 female 8670 0 1.51.500

Ronald H. Heck Model Estimation Example 7 Below (Table 7) we can see the iteration history for estimating the model with two predictors. We have the initial estimate of the log likelihood (which is for the model with no predictors). Then the model begins to iterate (using maximum likelihood) to solve the equation in a way that maximizes the estimates of the effects of each predictors on the outcomes. You can see that it takes several trials or iterations to reach an optimal solution of the population estimates from the sample data. You can see at each iteration that the estimates of the test score effect and the female (or gender effect) change a little, until the convergence criteria are satisfied. Table 7. Iteration History Iteration Update Type Number of Stephalvings Log Likelihood b Parameter Threshold test1 female (Scale) [courses=0] [courses=1] [courses=2] [courses=3] 0 Initial 0-9335.361514 -.200904 1.618601 4.345208 4.639164 1 Scoring 0-9098.203381 1.789380 3.608885 6.335493 6.629448 2 Newton 0-9074.287944 2.344517 4.239535 7.060096 7.356817 3 Newton 0-9070.118786 2.714341 4.619526 7.462558 7.758959 4 Newton 0-9070.113473 2.727496 4.633389 7.480902 7.777357 5 Newton 0-9070.113473 2.727515 4.633409 7.480936 7.777390 6 Newton a 0-9070.113473 2.727515 4.633409 7.480936 7.777390 Redundant parameters are not displayed. Their values are always zero in all iterations. Model: (Threshold), test1, female a. All convergence criteria are satisfied. b. The kernel of the log likelihood function is displayed. Next, in Table 8 we see a summary of the various fit indices for the model. Table 8. Goodness of Fit Value df Value/df Deviance 15809.315 17750.891 Scaled Deviance 15809.315 17750 Pearson Chi- 24847.871 17750 1.400 Scaled Pearson Chi- 24847.871 17750 Log Likelihood a -9070.113 Akaike's Information Criterion (AIC) 18152.227 Finite Sample Corrected AIC (AICC) 18152.237 Bayesian Information Criterion (BIC) 18194.633 Consistent AIC (CAIC) 18200.633 Model: (Threshold), test1, female.00000 0.000000 1.04071 6.020178 1.05287 3.019317 1.06081 6.018971 1.06109 7.018982 1.06109 7.018982 1.06109 7.018982 1

Ronald H. Heck Model Estimation Example 8 a. The kernel of the log likelihood function is displayed and used in computing information criteria. From Table 8 we can see that the log likelihood has been reduced considerably in this model. Some of this other model fitting information may be familiar to you (e.g., AIC and BIC). AIC and BIC are estimated from the log likelihood (with additional terms). For example, for the AIC index (where k is the number of parameters in the model): AIC = 2k + ( 2LL) = 2(6) + 18,140.227 = 18152.227 The likelihood ratio chi-square, which is calculated directly from the change in the log likelihoods between the initial (no predictors) model, and the second model (with 2 predictors) can be used to construct a test of whether Model 2 fits the data better than Model 1 (the baseline model). Table 9. Omnibus Test a Likelihood Ratio Chi- Df Sig. 530.496 2.000 a. Compares the fitted model against the thresholds-only model. We can see that the chi square is significant for 2 degrees of freedom (the two added predictors). Here is how we calculate the coefficient from the change in log likelihoods. Initial log likelihood -9335.361514 Log likelihood for Model with 2 predictors -9070.113473 Difference in log likelihood*2 (265.248041*2) = 530.496082 (likelihood ratio chi-square for 2 df)

Ronald H. Heck Model Estimation Example 9 Finally, we can see the summary of the parameters in the model. We can see the earlier test score (I think it is an 8 th grade test) is a significant predictor of students perceptions of math course taking beyond Algebra I, while gender is not. Table 10. Parameter Estimates Parameter B Std. Error Hypothesis Test Wald Chi- df Sig. [courses=0] 2.728.2631 107.494 1.000 Threshold [courses=1] 4.633.2760 281.741 1.000 [courses=2] 7.481.3129 571.638 1.000 [courses=3] 7.777.3179 598.492 1.000 test1.061.0056 117.270 1.000 Female.019.0402.223 1.637 (Scale) 1 a Model: (Threshold), test1, female a. Fixed at the displayed value. We could add further variables and see if we could reduce the log likelihood further, but we will stop here for now. This should provide some type of example regarding how model estimation proceeds and how the criteria used to estimate the model results in a set of parameters and model fit criteria that can be used to evaluate how well the proposed model compares against the actual sample covariance matrix. References Azen, R. & Walker, C. (2011). Categorical data analysis for the behavioral and social sciences. New York: Routledge. Hox, J. (2010). Multilevel analysis: Techniques and applications (2 nd Edition). New York: Routledge. Loehlin, J. C. (1992). Latent variable models: An introduction to factor, path, and structural analysis (2 nd Edition). Hillsdale, NJ: Lawrence Erlbaum. Marcoulides, G. & Hershberger, S. (1997). Multivariate statistical methods: A short course. Mahwah, NJ: Lawrence Erlbaum. West, S., Finch, J., & Curran, P. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling. Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage.