SEM with observed variables: parameterization and identification. Psychology 588: Covariance structure and factor models

Similar documents
General structural model Part 1: Covariance structure and identification. Psychology 588: Covariance structure and factor models

Consequences of measurement error. Psychology 588: Covariance structure and factor models

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models

Inference using structural equations with latent variables

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Overview. 1. Terms and Definitions. 2. Model Identification. 3. Path Coefficients

An Introduction to Structural Equation Modeling with the sem Package for R 1. John Fox

THE GENERAL STRUCTURAL EQUATION MODEL WITH LATENT VARIATES

Confirmatory Factor Analysis. Psych 818 DeShon

Introduction to Structural Equation Modeling Dominique Zephyr Applied Statistics Lab

Do not copy, quote, or cite without permission LECTURE 4: THE GENERAL LISREL MODEL

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

We begin by thinking about population relationships.

Instrumental Variables

4. Path Analysis. In the diagram: The technique of path analysis is originated by (American) geneticist Sewell Wright in early 1920.

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

2 Regression Analysis

Instrumental Variables and the Problem of Endogeneity

Title. Description. Remarks and examples. stata.com. stata.com. Variable notation. methods and formulas for sem Methods and formulas for sem

Introduction to Confirmatory Factor Analysis

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Introduction to Structural Equation Modelling Answers to Exercises

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Supplemental material to accompany Preacher and Hayes (2008)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

Introduction to Structural Equation Modeling

1 Motivation for Instrumental Variable (IV) Regression

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models

The Linear Regression Model

An Introduction to Path Analysis

Math 20F Final Exam(ver. c)

Nesting and Equivalence Testing

Econometric Methods and Applications II Chapter 2: Simultaneous equations. Econometric Methods and Applications II, Chapter 2, Slide 1

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Structural equation modeling

Residual analysis for structural equation modeling

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Interpreting Regression Results

An Introduction to Mplus and Path Analysis

Multivariate Regression (Chapter 10)

Advanced Structural Equations Models I

An Introduction to Structural Equation Modelling John Fox

ECON 4160, Spring term 2015 Lecture 7

Stat 206: Linear algebra

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

1. The Multivariate Classical Linear Regression Model

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Chapter 4. Regression Models. Learning Objectives

Can you tell the relationship between students SAT scores and their college grades?

Factor Analysis. Qian-Li Xue

Chapter 4: Regression Models

Chapter 12 - Part I: Correlation Analysis

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Variable Selection and Model Building

Next is material on matrix rank. Please see the handout

Vector autoregressions, VAR

Key Algebraic Results in Linear Regression

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Multivariate Time Series

REGRESSION ANALYSIS AND INDICATOR VARIABLES

ECON The Simple Regression Model

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Statistical Techniques II EXST7015 Simple Linear Regression

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

Econometrics Summary Algebraic and Statistical Preliminaries

Lecture 4: Testing Stuff

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

1 Determinants. 1.1 Determinant

Nonrecursive models (Extended Version) Richard Williams, University of Notre Dame, Last revised April 6, 2015

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Missing dependent variables in panel data models

Instrumental Variables

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Stock Prices, News, and Economic Fluctuations: Comment

The 3 Indeterminacies of Common Factor Analysis

Title. Description. var intro Introduction to vector autoregressive models

THE GENERAL STRUCTURAL EQUATION MODEL WITH LATENT VARIATES

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Econometrics II - EXAM Answer each question in separate sheets in three hours

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

Addition and subtraction: element by element, and dimensions must match.

Variance Partitioning

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Evaluation of structural equation models. Hans Baumgartner Penn State University

Financial Econometrics

14 Multiple Linear Regression

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

Variance Partitioning

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Multiple Linear Regression CIVL 7012/8012

SEM 2: Structural Equation Modeling

Review Packet 1 B 11 B 12 B 13 B = B 21 B 22 B 23 B 31 B 32 B 33 B 41 B 42 B 43

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Review of Statistics

Transcription:

SEM with observed variables: parameterization and identification Psychology 588: Covariance structure and factor models

Limitations of SEM as a causal modeling 2 If an SEM model reflects the reality, the data will be consistent with the model, given that measurement errors are tolerable, all assumptions made are tenable, etc.; but the reverse is not true (e.g., see Fig. 3.9, p. 70) Theory- vs. data-driven Cause vs. effect indicators simultaneous reciprocal relation (e.g., financial health and stock price of companies) --- really concurrent? To be cautious about goodness (mostly badness) of fit testing

Fundamental equation with observed variables 3 Dependent variables y are modeled as linear combinations of (a subset of) y and x as y By Γx ζ IB Γx ζ Hypothesized explanation for the covariances of y unexplained (but allowed to exist) in the y variation From the measurement perspective, y and x may be considered as single-indicator latent variables with no measurement errors Demographic variables are often considered so (e.g., sex, age)

Fundamental equation with observed variables 4 Alternatively, y is modeled as linear combinations of a partitioned vector of y, x and ζ as y y By B Γ I x ByΓx ζ ζ This is essentially the approach taken by RAM (reticular action model), though RAM incorporates all latent variables, ξ and η --- to be discussed later

Two conditions for recursive models 5 B should be a lower triangular matrix, and so no feedback loop of causal paths All error terms ζ are not correlated with one another Correlated errors themselves don t result in a feedback loop; instead, they lead to inconsistent estimates due to errors correlated with explanatory variables zeta zeta2 x y y2 Note: any exogenous variables (including error terms) can be set to intercorrelate while no endogenous variables are allowed so (instead, set to correlate through exogenous variables)

Implied covariance matrix 6 Basic hypothesis with ideal measurement: Σ = Σ(θ) --- population covariance matrix Σ is a function of free model parameters θ Basic hypothesis in reality: S ˆ Σθ --- sample covariance matrix S is a function of estimates of model parameters ˆθ Discrepancy in the data (Σ vs. S) is due to sampling errors while the discrepancy in the parameters (θ vs. ˆθ ) is due to not only sampling errors but also any violated assumptions made for a particular way of finding the estimates (e.g., ML)

Implied covariance matrix 7 y Σθ y x y I B Γx ζ x E E E yy Eyx xy Exx I B ΓΦΓ Ψ I B I B ΓΦ ΦΓI B Φ

Identification 8 Preliminaries: SEM is a modeling technique for covariance structures; thus structural models are written and treated in the covariance form --- indirect vs. direct fitting In most SEM models, we exclusively consider only the covariance matrix, not means; in addition, third and higher moments are excluded by imposing multinormality on observed variables Variances and covariances of observed variables are assumed to be known; thus we represent model parameters as a function of these knowns

Identification in its complete sense means that unknown model parameters are determined to take particular values (preferably unique values) To identify unknown parameter values, we take two steps: Step : Identifiability of model form --- mathematical reasoning of whether a given model takes a form of solvable problem Step 2: Estimation of the parameter values --- numerical optimization of some loss function (e.g., sum of squared residuals for the OLS regression)

Given the degrees of freedom in the covariance data ( knowns ; df ) and in the model ( unknowns ; df 2 ), there are 3 possibilities: df = df 2 --- just identified, data exactly reproducible ( known to be identified, identifiable ) df > df 2 --- over-identified, uniquely identifiable with additional conditions but imperfect fit df < df 2 --- under-identified, impossible to uniquely identify y z z2 x y2

Identification: t-rule t pq pq 2 t = # of distinctive free parameters in θ, p + q = # of observed variables equality constraints reduce t but inequality constraints don t --- inequality constraints limit the search space of the parameter, and so they may produce more accurate and reliable estimates if they are consistent with the data Necessary but not sufficient; only useful for ruling out unidentifiable models

E.g., the SES model (Fig. 4.4, p. 84) meets the t-rule (df = df 2 = 5), but it doesn t mean it s identifiable --- we ll reconsider this model with other rules 0 2 0 0 B 0 0, Γ 0, Ψ, 2 22 2 22 3 32 0 0 0 3 32 33 Φ 2 22 e_inc x : INC, x 2 : OCC y : SUBINC, y 2 : SUBOCC y 3 : SUBSES INC OCC SUBINC SUBOCC e_occ SUBSES e_ses

Identification: null B rule 3 All DVs are influenced only by IVs, not DVs; i.e., there is no mediators; and so the fundamental equation reduces to the multivariate regression model (with some regression coefficients possibly constrained to zero): y Γx ζ Consequently, the implied covariance matrix reduces to: Σθ Σ Σ ΓΦΓ Ψ ΓΦ Σxy Σxx ΦΓ Φ yy xy And all parameter sets (unknowns) are identified as: Φ Σ, Γ Σ Σ, Ψ Σ Σ Σ Σ xx xy xx yy xy xx xy

B = 0 is a sufficient condition for identification, but not necessary, and so this rule doesn t tell if models with B 0 are identifiable Furthermore, if all entries of are unconstrained (i.e., the multivariate linear regression model), models with B = 0 are just identified implying a perfect fit How is the perfect fit possible if residual variances (,, pp ) in the regression model are not zero, unless the DVs are exactly linear combinations of IVs (i.e., Ψ = 0)? Obviously, the SES model doesn t meet the null-b rule

Identification: recursive rule 5 All (fully) recursive models are identifiable: B is lowertriangular and Ψ is diagonal Consider a single equation for y i in y By Γx ζ y y y x x i i i, i i i iq q i β, γ z i i i i where and are non-zero elements in the i-th row of B βi γi and Γ, and z i collects corresponding y and x variables; By postmultiplying both sides by z i and taking expectation,, E yz E β γ zz E z i i i i i i i i

, E yz E β γ zz E z i i i i i i i i σ β γ Σ σ β γ σ Σ,, y z i i z z z i i y z z z i i i i i i i i i i since i (error term of y i ) is orthogonal to z i (the predictors of y i ) --- a regression model form Likewise, by postmultiplying the y i equation by its transpose and taking expectation (and substituting the above), we have β i E y, iyi ii βi γi E zz i i E i i γi β i βi, γi Σz iz i ii σy izσ i zizσ i yiz i ii γi

σ Σ σ σ Σ σ ii y z z z y z ii ii ii y z z z y z i i i i i i i i i i i i The estimation equation for β i and γ i is the OLS estimator (with some y as IVs), and so with proper distributional assumptions on y i, we can do statistical testing as in regression analysis Recursive rule is sufficient but not necessary, and so some models with non-triangular B and/or non-diagonal Ψ may be identifiable; recursive rule met for the SES model? z x y y3 z3 x2 y2 z2

Identification: order and rank conditions 8 Useful for nonrecursive models No specific condition on B except that (I B) is nonsingular so that (I B) exists Identification is considered one equation at a time No restriction in Ψ, and so these rules would not necessarily apply to cases of restricted Ψ (e.g., diagonal) in that a restricted Ψ may help identification of otherwise unidentifiable models

Order condition 9 Consider the equation for y i similar to the one before y β, γ z i i i i i where βi is row i in B without the i-th element, γ is row i in Γ, and accordingly z i collects all y and x variables except y i By postmultiplying both sides by x' and taking expectation, we have q covariances between y i and x (all IVs) σ β, γ Σ σ Σ i yx i i i zx or i yx i zx i q q qp γ qp i β which has q equations with q + p unknowns

Consequently, if p or more of the unknowns are excluded by zero constraints, all free parameters for the y i equation are identifiable All equations order conditions can be collectively checked with C IB, Γ If each row of C has p or more zeros, it passes the order condition; satisfied for the SES model? C 2 0 0 0 0 3 32 0 0 2 22

Useful for ruling out underidentified models with unconstrained, since it s necessary but not sufficient A case when the order condition is met for individual equations, yet the model is not uniquely identifiable (e.g. in pp. 00-0) 2 0 C 2 0 0 * By taking a linear combination of 2 2 we have an alternative solution for the first equation with the same form, implying the solution non-unique: C 0 2 0 0 * * * 2 a a c c c, x x2 y y2 z z2

For the second equation, any linear combination of the two equations can t produce an alternative solution of the same form as the second row (i.e., the last two entries of zero), which implies that the second equation is uniquely identifiable The insufficiency of the order condition leads to the rank condition

Rank condition 23 From the C for the order condition, delete all columns with non-zero at row i and call the submatrix of remaining columns C i The i-th equation is identified if rank(c i ) = p This guarantees that the p equations (excluding the i-th) are all linearly independent (which are independent from equation i as well) The rank condition is necessary and sufficient for identification, but it doesn t take into account any restrictions on Ψ, and so some models with restricted Ψ may be identifiable even if the rank condition is not met

For the SES model in p. 84, is the rank condition met for each equation? 2 0 0 C 0 0 3 32 0 0 2 22 z How about the example of x y C 2 0 2 0 0 x2 y2 z2 y3 z3

little more to identification 25 Inequality constraints may or may not help identification since it limits the range of possible values of a free parameter, not to a particular value Constant values, equality, and linear combinations of other parameters are also a form of restriction that affects identification, but the identification rules considered so far assume only zero-constraints (except for t-rule) Identification discussed so far focuses only on whether the equation systems are solvable, which does not guarantee convergence or properness of a solution (e.g., negative ψ ii ) Identification rules considered here applies to a part of the general model, not the measurement models part