Identifying and accounting for outliers and extreme response patterns in latent variable modelling

Similar documents
Factor Analysis and Latent Structure of Categorical Data

Generalized Linear Models for Non-Normal Data

Latent variable models: a review of estimation methods

For more information about how to cite these materials visit

Chapter 4: Factor Analysis

Stat 5101 Lecture Notes

Journal of Biostatistics and Epidemiology

CHAPTER 5. Outlier Detection in Multivariate Data

Factor Analysis (10/2/13)

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Monitoring Random Start Forward Searches for Multivariate Data

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

November 2002 STA Random Effects Selection in Linear Mixed Models

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Introduction To Confirmatory Factor Analysis and Item Response Theory

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

Fast and robust bootstrap for LTS

TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION. Anthony Atkinson, 25th March 2014

Accurate and Powerful Multivariate Outlier Detection

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

Markov Chain Monte Carlo

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

Regression Diagnostics for Survey Data

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Fractional Imputation in Survey Sampling: A Comparative Review

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Bayesian Nonparametric Rasch Modeling: Methods and Software

Comparison between conditional and marginal maximum likelihood for a class of item response models

The linear model is the most fundamental of all serious statistical models encompassing:

MATH 644: Regression Analysis Methods

Sparse Linear Models (10/7/13)

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Robust Regression Diagnostics. Regression Analysis

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Bayesian linear regression

Marginal Specifications and a Gaussian Copula Estimation

PIRLS 2016 Achievement Scaling Methodology 1

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

Gibbs Sampling in Latent Variable Models #1

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Supplementary Material for Wang and Serfling paper

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Plausible Values for Latent Variables Using Mplus

Bayesian Linear Regression

Bayes methods for categorical data. April 25, 2017

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Discrete Choice Modeling

ML estimation: Random-intercepts logistic model. and z

UNIVERSITY OF TORONTO Faculty of Arts and Science

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

Nesting and Equivalence Testing

Introduction to Basic Statistics Version 2

Psych Jan. 5, 2005

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA)

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Appendix: Modeling Approach

Nonresponse weighting adjustment using estimated response probability

Bayesian non-parametric model to longitudinally predict churn

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Regression diagnostics

Comparing IRT with Other Models

Chapter 1 Statistical Inference

Regression Analysis for Data Containing Outliers and High Leverage Points

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Bayesian inference for factor scores

Gaussian Mixture Models, Expectation Maximization

Package ForwardSearch

MID-TERM EXAM ANSWERS. p t + δ t = Rp t 1 + η t (1.1)

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Bayesian Model Diagnostics and Checking

Default Priors and Effcient Posterior Computation in Bayesian

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

Graphical Models and Kernel Methods

Bayesian model selection: methodology, computation and applications

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.

STA 2201/442 Assignment 2

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Biostat 2065 Analysis of Incomplete Data

PMR Learning as Inference

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Transcription:

Identifying and accounting for outliers and extreme response patterns in latent variable modelling Irini Moustaki Athens University of Economics and Business

Outline 1. Define the problem of outliers and atypical response patterns. 2. Treat aberrant response patterns within a LVM framework. Forward Search and Backward Search Mixture model Robust ML estimation 3. Applications

Response strategies 1. Main response strategy: Responses to educational and psychological tests are supposed to be based on latent constructs. The strategy is used by the majority of the respondents and can be modelled using an IRT model. 2. Alternative response strategy: Based on guessing and randomness. Secondary strategies are modelled as separate latent classes. Hybrid model for binary responses (Yamamoto, 1989) and for ordinal responses (Nelson, 1999).

Secondary response strategies - Outliers Categorical responses 1. The response style agreement: tendency to agree/disagree with all questions regardless their content. 2. The extreme response style: tendency for some individuals to use the extreme ends of a Likert type response scale (e.g. strongly agree or strongly disagree) 3. The neutral response style: tendency to choose the middle alternative in a Likert type scale. Continuous responses: a response more than 3 standard deviations away from the mean is considered an unexpected response under the normal model.

Aberrant response patterns are identified using person fit statistics, residuals, or latent class analysis as long as an expected response-type style is adopted as a benchmark (e.g. Guttman scale). Those response styles should either be seen as outliers and therefore controlled or removed from the estimation process or they should be treated as a manifestation of certain respondents characteristics. In the literature, interest lies on how response styles differentiate among groups defined by demographic and socio-economic variables.

Searching for outliers and extreme response patterns ML estimation assumes that the responses are exactly generated by the true model. Proposed a robust estimator that does not break down when outliers are present in the data set for the latent variable model with mixed binary and continuous responses (Moustaki and Victoria-Feser, 2006, JASA). Implement algorithms such as the forward search (Hadi 1992, Atkinson 1994) for identifying outliers (Mavridis, PhD thesis). Use the latent variable model to accommodate for outliers in the data. Predict and account for the proportion of individuals that guessed a response pattern (Martin Knott).

Theoretical Framework for LVM Bartholomew and Knott (1999) A LVM, models the associations among a set of p observed variables (y 1, y 2,..., y p ) using q latent variables (z 1, z 2,..., z q ) where q is much less than p As only y can be observed any inference must be based on the joint distribution of y: f(y) = g(y z)φ(z)dz R z R z φ(z): prior distribution of z g(y z): conditional distribution of y given z. What we want to know: φ(z y)

If correlations among the y s can be explained by a set of latent variables then when all z s are accounted for the y s will be independent (local independence). q must be chosen so that: g(y z) = p g(y i z) i=1 The question is whether f(y) admit the presentation: f(y) = for some small value of q. R z R z p g(y i z)h(z)dz i=1

Identifying aberrant responses in factor analysis for continuous responses y i = µ i + q λ ij z j + e i i = 1,..., p (1) j=1 z is called the common factor since it is common to all y i s. The e i s were sometimes called specific since they are unique to a particular y i.

Cov(e,z) = 0 y = µ + Λz + e e N(0, Ψ) We also assume that e 1, e 2,..., e p are independent so that y 1, y 2,..., y p are conditionally independent given z. Then we can make some deductions about the distribution of the ys and, in particular, about their covariances and correlations. We can choose the scale and origin of z as we please because this does not affect the form of the regression equation. All parameter estimation techniques minimize some function of the distance between the sample covariance matrix S and the covariance matrix under the model, Σ(θ)

Methods for identifying outliers Model-free methods: graphs, distance measures: D(m) = (y m T(y)) C 1 y (y m T(y)), m = 1..., n T(y) is known as the location parameter while C y is known as the dispersion parameter. Mahalanobis distance. Substituting T(y) with the arithmetic mean and C y with the sample variance covariance matrix. MD(m) = (y m ȳ) S 1 y (y m ȳ), m = 1..., n ȳ and S y are sensitive to the presence of outliers.

Multivariate location and dispersion are not robust and the breakdown point of this estimator is 0%. Yuan and Bentler (2001) and Moustaki and Victoria-Feser (2006) show theoretically and via a simulation study the sample covariance matrix has unbounded influence function and zero breakdown point. Robust estimators for the location and dispersion parameter have been proposed (Gnanadesikan and Kettenring, 1972, Campbell, 1980, Rousseeuw, 1985 and Rousseeuw and Vanzomeren, 1990). Robust estimates for FA model. Use a robust sample covariance matrix (Filzmoser, 1999 and Pison et. 2003) Examination of residuals

Detection errors 1. Outliers may be responsible for improper solutions 2. Swamping effect: an observation is considered as an outlier when it is not. 3. Masking effect: an outlier is not detected (cluster of outliers).

Backward Search Hadi (1992), Hadi and Simonoff (1993), Atkinson (1994), Atkinson and Riani (2000), Atkinson, Riani and Cerioli (2004) 1. Start with the whole data set 2. Compute a measure that is intended to measure outlyingness 3. The observation which seems to be the most extreme, according to the criterion used, is deleted. 4. This process is iterated until a certain number of observations is excluded.

Forward Search 1. Start with an initial outlier free subset of the data (basic set) of size g. 2. Order the observations in the non-basic set according to their closeness to the basic set. 3. Add the least aberrant observation to the initial subset 4. Repeat the process until all observations are included. 5. Monitoring the search.

Choice of the initial subset Usually innumerable subsets of size g: (C n g ) If there are t outliers then the probability of selecting a clean initial sample is : C t 0 Cn t g C n g 1. Draw J sub-samples of size g, where g > p(p+1) 2 2. For each subsample, a factor analysis model is fitted and the model parameters, θ k = (λ k, Ψ k ), are estimated. k = 1..., J 3. For each subsample an objective function F(y, θ k ) is computed. Choices of objective functions: median of likelihood contributions, min(trace(s Σ(ˆθ k ))) 2 ), max(log-likelihood)

Progressing in the search 1. Order the observations in the non-basic set using the estimated parameters from the basic test. 2. Statistics used for the ordering: likelihood contributions, residuals, Mahalanobis distances 3. The next observation to enter the basic set is the one with the largest likelihood contribution or the smallest residual etc. 4. Stepwise procedure - units might interchange

Searching for sharp changes in: Parameter estimates t-tests Goodness-of-fit statistics Measures of fit Residuals Cook-type statistics Monitoring the search

General remarks In mostof the examples we have looked the BS and the FS pick up the same observations as outliers. BS is less time consuming the FS. Do outliers always enter in the last steps? In the example to follow none of the contaminated observations are excluded (masking effect) when the BS was used. Analysis of residuals did not identify the contaminated data.

Application Exam grades of 100 students on 5 exams. The first two exams are related to mathematics, the next two to literature and the last one is a comprehensive exam. one-factor model two-factor model ˆλ ˆΨ ˆλ1 ˆλ2 ˆΨ 0.602 0.637 0.628 0.348 0.482 0.668 0.553 0.699 0.328 0.403 0.770 0.406 0.778-0.206 0.351 0.720 0.481 0.724-0.207 0.432 0.915 0.162 0.896-0.047 0.194 The asymptotic p-value for the LR-test is 0.0332 for the one-factor model and 0.7061 for the two-factor model

Contaminated the data set The first 80 individuals of the grades example after they have been sorted according to the FS were selected to represent the outlier-free part of the data The responses of 20 individuals are generated from a multivariate normal distribution, N (µ, I) where µ N(µ 80, 15 2 ). The contaminated observations are labelled in the data set from 1 to 20.

Forward Search and results A FS was conducted using likelihood contributions for adding new observations to the basic set. The initial sample was selected by examining 1000 subsets of size 16 with the U LS criterion. None of the initial samples contained outliers. The search consists of 84 steps, the 20 contaminants did not enter in the last 20 steps but started entering at step 51 and the last entered at step 79.

12 10 8 6 4 2 0 0 20 40 60 80 100 Individuals Figure 1: Plot of residuals

50 40 30 20 10 0 0 20 40 60 80 100 LRS 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 WLSC 15 10 5 0 0 20 40 60 80 100 Swamping effect 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 ULSC 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 Psi 40 30 20 10 0 0 20 40 60 80 100 Masking effect contaminated units Figure 2: Various plots of fit

Modelling Secondary Response Strategies Binary Data Responses with all 1 s are denoted with y e The (2 p 1) in number remaining patterns are denoted with yē. A pseudo item is used to indicate whether an extreme response pattern, y e, is guessed. The pseudo item is denoted with u and it takes the value 1 when a response pattern is guessed and 0 otherwise. Note that the pseudo item is not observed in the data since we do not know in advance the number of extreme response patterns that have been guessed.

Table 1: Response mechanism Guessing No Guessing Total (u = 1) (u = 0) Extreme Response (y e ) n e,g n e,ḡ n e Non-Extreme Response (yē) 0 nē,ḡ nē Total n g nḡ n Individuals who have given a non-extreme response pattern have not guessed [P(u = 1 yē) = 0)] and therefore P(u = 0 yē) = 1. Those individuals who guessed the probability of responding with an extreme response pattern is one (P(y e u = 1) = 1).

Modelling the guessing response mechanism We define the distributions of the responses to the items (y 1,..., y p ) and the unobserved guessing item u conditional on a single latent variable z. Under the assumption of conditional independence f(y e, u = 0 z) = [ p ] f yi (yi e z) f u (u = 0 z) (2) i=1 f(y e, u = 1 z) = f u (u = 1 z) (3) [ p ] f(yē, u = 0 z) = f yi (yēi z) f u (u = 0 z), (4) i=1 f(yē, u = 1 z) = 0 (5)

The density f yi (y i z) is the conditional distribution of each binary item taken to be the Bernoulli: f yi (y i z) = [π i (z)] y i [1 π(z)] 1 y i, i = 1,..., p (6) where π i (z) = P(y i = 1 z). The response probability π i (z) for the p observed items is modelled with the two-parameter logistic model. logitπ i (z) = α 0i + α 1i z, i = 1,..., p, (7) where the parameters α 0i and α 1i are the intercepts and factor loadings respectively.

The model for the pseudo item u becomes: logitπ u (z) = α 0,u + α 1,u z + s β j,u x s, (8) j=1 where β j,u are regression coefficients.

Estimation The complete likelihood for a random sample of size n is: l = n m=1 f(y m, u m, z m ) (9) with f(y m, u m, z m ). The only observed part is y m = (y 1m,..., y pm ), where both the pseudo item u m and the latent variable z m are not observed. For the maximization of the log-likelihood the E-M algorithm is used.

Therefore, the conditional expected values of the score functions are taken over the posterior distribution f(u,z y). For the extreme response patterns: E { } lnf(y e, u,z) y e α il = [y e i π i (z)]z l f(u = 0, z y e ), i = 1,..., p [0 πu (z)]z l f(u = 0, z y e )+ [1 πu (z)]z l f(u = 1, z y e )

For the non-extreme response patterns yē we have: E { } α lnf(yē, u,z) yē il = [y ē i π i (z)]z l f(u = 0, z yē), i = 1,..., p [0 πu (z)]z l f(u = 0, z yē) The integrals are approximated using Gauss-Hermite quadrature points z t and corresponding weights φ(z t ). Alternative approximations can be used (Monte Carlo, Laplace, Adaptive Quadrature Points)

Guessing response process is completely at random - a simple case When the odds of guessing does not depend on a latent variable or on covariates, the terms f u (u = 1 z) and f u (u = 0 z) become: and f u (u = 1 z) = η f u (u = 0 z) = 1 η, where η is the proportion of guessers out of the total sample size and it does not depend on the latent variable z. No model is fitted to the guessing/pseudo item.

Taken into account that in any given data set there is a fixed number of observed responses of type y e, the model estimation and updating of the number of response patterns are done as follows: 1. Select number of guesses at extreme say n g (y e ) where the proportion of guessed responses is then defined as: η = n g(y e ) n 2. Fit the one factor model to the remaining observations (n n g (y e )) where n is the total sample size.

3. Update the proportion of guessed extreme responses by computing the conditional probability of guessing given an extreme response pattern: P(u = 1 y e ) = η η + f(y e u = 0)(1 η) (10) where f(y e u = 0) is a conditional probability computed from the model. 4. The allocation to an extreme response pattern y e to guessers is: n gnew (y e ) = n gold (y e ) n gold (y e ) + f(y e )(n n gold (y e )) n e where n e is the total number of extreme response patterns. No guessers are allocated to the non-extreme group.

Model interpretation What do we expect when we adjust for guessed extreme patterns? Improve the fit (study goodness-of-fit measures) Interpret the factor loadings taking into account the existence of outliers Relate guessing mechanism to covariates and identify demographic groups that are more inclined to guess than other groups.

Workplace Industrial Relations Survey Please consider the most recent change involving the introduction of new plant, machinery and equipment. Were discussions or consultations of any of the type on this card held either about the introduction of the change or about the way it was to be implemented? 1. Informal discussion with individual workers. 2. Meetings with groups of workers. 3. Discussions in established joint consultative committee. 4. Discussions in specially constituted committee to consider the change. 5. Discussions with union representatives at the establishment. 6. Discussions with paid union officials from outside.

n = 1005 non-manual workers. The one-factor model gives G 2 = 269.4 and X 2 = 264.2 on 32 degrees of freedom. The fit on the univariate, bivariate and trivariate margins suggest that the bad fit is due to item 1.

Table 2: Chi-squared residuals greater than 3 for the second and (1,1,1) third order margins for the one-factor model, WIRS data Response Items O E O E (O E) 2 /E (0,0) 2, 1 186 265.75 79.75 23.93 (0,1) 2, 1 233 153.23 79.77 41.52 (1,0) 2, 1 444 364.25 79.75 17.46 4, 1 172 145.48 26.52 4.84 4, 2 61 87.00 26.00 7.77 (1,1) 2, 1 142 221.77 79.77 28.69 4, 1 69 95.65 26.65 7.43 4, 2 180 154.13 25.87 4.34 (1,1,1) 1, 2, 3 37 75.79 38.79 19.85 1,2, 4 23 61.75 38.75 24.32 1, 2, 5 53 94.85 41.85 18.46 1, 2, 6 26 40.32 14.32 5.08 1, 3, 4 30 45.69 15.69 5.39 1, 4, 5 35 55.73 20.73 7.71 2, 3, 4 93 75.03 17.97 4.31 2, 4, 5 108 91.39 16.61 3.02

In the original data set (n = 1005) there were only three extreme response patterns (111111) with expected frequency 10.802 under the one-factor model. There were also thirty response patterns of type (011111) with expected frequency 13.883. For the purpose of our analysis, we took response patterns (011111), thirty in total, and changed them to (111111).

Maximum likelihood standardized loadings Item No-guessing Simple Guess Model Guess ˆα i 1 0.15 0.43 0.44 2 0.34 0.13 0.13 3 0.86 0.84 0.85 4 0.71 0.56 0.56 5 0.89 0.89 0.90 6 0.80 0.74 0.75 7 0.90 LogL= -3420.065-3414.88740 ˆα i ˆα i

Observed and expected frequencies for extreme response patterns Response Observed Expected pattern frequency frequency One-factor model 111111 33 15.747 One factor with 1 26 guessing 111111 7 6.784 One-factor model 1111111 27 27.212 with guessing model 1111110 6 5.995

Conclusions The model itself adjusts for guessed extreme response patterns. Extensions to other types of extreme responses. Link and compare this method with other methods available such as robust estimation and subset regression methods. The FS algorithm has been extended to binary and mixed type data.