General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

Similar documents
Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

SEM for Categorical Outcomes

Comparing IRT with Other Models

Bayesian Analysis of Latent Variable Models using Mplus

Nesting and Equivalence Testing

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing measurement invariance in multilevel data

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Introduction to Statistical Analysis

Plausible Values for Latent Variables Using Mplus

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Model Estimation Example

Assessment, analysis and interpretation of Patient Reported Outcomes (PROs)

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Correlations with Categorical Data

Generalized Linear Models for Non-Normal Data

Data Analysis as a Decision Making Process

Factor Analysis. Qian-Li Xue

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

What is Latent Class Analysis. Tarani Chandola

Dimensionality Assessment: Additional Methods

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Generalized Linear Latent and Mixed Models with Composite Links and Exploded

WHAT IS STRUCTURAL EQUATION MODELING (SEM)?

A Multilevel Analysis of Graduates Job Satisfaction

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Mixture Modeling. Identifying the Correct Number of Classes in a Growth Mixture Model. Davood Tofighi Craig Enders Arizona State University

Assessing Factorial Invariance in Ordered-Categorical Measures

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Ridge Structural Equation Modeling with Correlation Matrices for Ordinal and Continuous Data. Ke-Hai Yuan University of Notre Dame

STATISTICS ( CODE NO. 08 ) PAPER I PART - I

BMC Medical Research Methodology

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Using Mixture Latent Markov Models for Analyzing Change in Longitudinal Data with the New Latent GOLD 5.0 GUI

INTRODUCTION TO STRUCTURAL EQUATION MODELS

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Introduction to Structural Equation Modeling

Generalized Models: Part 1

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Using Bayesian Priors for More Flexible Latent Class Analysis

Maximum Likelihood Estimation. only training data is available to design a classifier

Introduction to Generalized Models

STA 216, GLM, Lecture 16. October 29, 2007

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Factor Analysis and Latent Structure of Categorical Data

Chapter 16: Correlation

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models

Final Exam - Solutions

Stat 642, Lecture notes for 04/12/05 96

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018

SEM with Non Normal Data: Robust Estimation and Generalized Models

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models

Strati cation in Multivariate Modeling

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

Outline

Contents. Acknowledgments. xix

Using SEM to detect measurement bias in dichotomous responses

Growth models for categorical response variables: standard, latent-class, and hybrid approaches

Lecture 5 Models and methods for recurrent event data

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Statistics: A review. Why statistics?

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Chi-Square. Heibatollah Baghi, and Mastee Badii

8 Nominal and Ordinal Logistic Regression

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

Basic IRT Concepts, Models, and Assumptions

Lecture 6: Gaussian Mixture Models (GMM)

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

Model fit evaluation in multilevel structural equation models

Title: Testing for Measurement Invariance with Latent Class Analysis. Abstract

Ruth E. Mathiowetz. Chapel Hill 2010

Introducing Generalized Linear Models: Logistic Regression

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Old and new approaches for the analysis of categorical data in a SEM framework

Comparison between conditional and marginal maximum likelihood for a class of item response models

Relating Latent Class Analysis Results to Variables not Included in the Analysis

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Continuous Time Survival in Latent Variable Models

Psychology 282 Lecture #4 Outline Inferences in SLR

Bayes methods for categorical data. April 25, 2017

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Mplus Short Courses Topic 2. Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes

MODEL IMPLIED INSTRUMENTAL VARIABLE ESTIMATION FOR MULTILEVEL CONFIRMATORY FACTOR ANALYSIS. Michael L. Giordano

Multilevel Structural Equation Modeling

COSINE SIMILARITY APPROACHES TO RELIABILITY OF LIKERT SCALE AND ITEMS

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Lecture Notes 2: Variables and graphics

The comparison of estimation methods on the parameter estimates and fit indices in SEM model under 7-point Likert scale

Glossary for the Triola Statistics Series

Transcription:

General structural model Part 2: Categorical variables and beyond Psychology 588: Covariance structure and factor models

Categorical variables 2 Conventional (linear) SEM assumes continuous observed variables (except for exogenous x) --- thus, SE modeling of categorical variables not fully justifiable Empirical (discretized) vs. conceptual categories: Length measured in quarter-inch intervals # of deaths for heart failure Political affiliation, ethnicity Color Dichotomies as quantitative variables --- dichotomous (and polytomous) variables used for quantification of nominal variables and any quantitative analysis/interpretation with them meaningful up to distinction of the categories

Why problem? 3 Discretized variables are necessarily censored at the tails and center becomes taller with fewer categories --- deviation from normality gets severe with 2 or 3 categories If continuous variable discretized, is it polytomous or ordinal? Crude measurement (too much rounding) --- increased measurement error Individual differences in where to put thresholds --- may create some systematic tendency (bias) or add more measurement error at best Following histograms show effects on kurtosis by even-interval categorization (N = 3)

15 continuous, b 2 =2.935 15 11 cat, b 2 =2.97 15 9 cat, b 2 =2.878 1 1 1 5 5 5-4 -2 2 4-4 -2 2 4-4 -2 2 4 15 7 cat, b 2 =2.968 15 6 cat, b 2 =2.823 15 5 cat, b 2 =3. 1 1 1 5 5 5-4 -2 2 4-4 -2 2 4-4 -2 2 4 15 4 cat, b 2 =2.732 15 3 cat, b 2 =7.5 15 2 cat, b 2 =1. 1 1 1 5 5 5-4 -2 2 4-4 -2 2 4-4 -2 2 4

Consequences 5 Suppose a linear structure holds for true, unobserved continuous indicators y * as: * y Λη y ε then the categorized indicators y don t agree with the model: y Λη ε, Σ Σθ biased θˆ y acov * * s, acov, ij sgh sij sgh invalid stat testing

Simulation results 6 Excessive kurtosis and skewness created by categorization result in too large chi-square (more rejection of correct parsimonious models than it should) and too large SE (more rejection of correct non-zero θ) Chi-square estimates tend to be more influenced by excessive kurtosis and skewness than by # of categories Generally coefficients (β and γ) and loadings are attenuated toward --- in that categorization adds measurement errors When unobserved continuous indicators are highly correlated, categorization into few categories may artificially increase factorial complexity (resulting in correlated errors) --- since mis-classifying has a bigger consequence (than less correlated cases) and the consequence is likely to vary by variables

Correction of Σ 7 Assuming the unobserved, continuous y * takes certain distributional form (most often normal), Σ * (i.e., tetrachoric or polychoric correlations) may be estimated based on observed proportions at bivariate combinations of categories, by maximizing the likelihood: ln c d L A N i 1 j 1 ij ln ij a, b a, b a, b a, b ij 2 i j 2 i 1 j 2 i j 1 2 i 1 j 1 where N ij and π ij are, respectively, frequency and probability at the ij-th category of y 1 and y 2 ; Φ 2 is CDF of bivariate normal distribution; and a i and b j are thresholds for the ij-th category

Any continuous y is used as observed so that the entries of Σ * are Pearson, polyserial (biserial) or polychoric (tetrachoric) correlations ML estimation of these correlations requires intensive computation --- thus, unstable with small samples Given Σ *, the usual SEM estimators will provide consistent estimates of θ, but WLS is recommended for correct statistical testing --- available in PRELIS (included in LISREL) See the examples, Tables 9.6 & 9.8

Nonlinear measurement models 9 Relationship between observed and latent variables is defined as, e.g., the logistic or ogive function: If y * is normal, Pr(y < c) follows the normal CDF (ogive function) with varying central locations Assuming only one latent variable, it becomes graded item response or 2 parameter logistic model The generalized latent variable modeling approach allows for such nonlinear relationships, along with other relationships for counts and duration (survival), by adopting the generalized linear modeling (GLM) approach --- offered e.g., by Mplus

Comprehensive treatment of the generalized modeling approach --- Skrondal A. & Rabe-Hesketh S. (24). Generalized latent variable modeling, CRC Short introduction --- Muthen B.O. (22). Beyond SEM: General latent variable modeling. Behaviormetrika, 29, 81-117. (available in the course website)

Further developments of SEM 11 Latent growth curve modeling Multilevel SEM for hierarchically designed data Categorical latent variables When one latent categorical variable assumed with multiple categorical indicators, it becomes latent class model More general modeling framework is what s known as finite mixture modeling --- possible with continuous indicators It yields probabilistic membership as latent variable scores Such idea of latent clusters can be applied to any SEM modeling approaches