MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS 1. INTRODUCTION

Similar documents
FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

LEAST SQUARES METHODS FOR FACTOR ANALYSIS. 1. Introduction

Multilevel Analysis, with Extensions

REGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS 1. MORALS

CENTERING IN MULTILEVEL MODELS. Consider the situation in which we have m groups of individuals, where

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

ALGORITHM CONSTRUCTION BY DECOMPOSITION 1. INTRODUCTION. The following theorem is so simple it s almost embarassing. Nevertheless

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Multivariate Statistical Analysis

QUADRATIC MAJORIZATION 1. INTRODUCTION

Matrix Factorizations

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

An Introduction to Multivariate Statistical Analysis

Next is material on matrix rank. Please see the handout

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Number of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations

Mathematical foundations - linear algebra

over the parameters θ. In both cases, consequently, we select the minimizing

Econ Slides from Lecture 7

Simultaneous Diagonalization of Positive Semi-definite Matrices

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

CS281 Section 4: Factor Analysis and PCA

List of Symbols, Notations and Data

Exploratory Factor Analysis: dimensionality and factor scores. Psychology 588: Covariance structure and factor models

Chap 3. Linear Algebra

MAJORIZATION ALGORITHMS FOR MIXED MODEL ANALYSIS. Contents. 3. The BLUE, the BLUP, and the MME Multilevel Models 13

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Preface to Second Edition... vii. Preface to First Edition...

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

A GENERALIZATION OF TAKANE'S ALGORITHM FOR DEDICOM. Yosmo TAKANE

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Review of some mathematical tools

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

Linear Algebra Massoud Malek

ON THE SINGULAR DECOMPOSITION OF MATRICES

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Stat 159/259: Linear Algebra Notes

Part 6: Multivariate Normal and Linear Models

Linear Systems. Carlo Tomasi

PROBLEMS AND SOLUTIONS

Elementary linear algebra

Consistent Bivariate Distribution

The following definition is fundamental.

Matrix Mathematics. Theory, Facts, and Formulas with Application to Linear Systems Theory. Dennis S. Bernstein

Review (Probability & Linear Algebra)

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

Numerical Methods I Singular Value Decomposition

Numerical Linear Algebra Homework Assignment - Week 2

Thus ρ is the direct sum of three blocks, where block on T 0 ρ is the identity, on block T 1 it is identically one, and on block T + it is non-zero

Hands-on Matrix Algebra Using R

STUDY PLAN MASTER IN (MATHEMATICS) (Thesis Track)

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday 10 February 2004 (Day 1)

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Algebra C Numerical Linear Algebra Sample Exam Problems

MATH 583A REVIEW SESSION #1

Properties of Matrices and Operations on Matrices

Chapter 3 Transformations

Mathematics. EC / EE / IN / ME / CE. for

CS 143 Linear Algebra Review

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Chapter 2 Canonical Correlation Analysis

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

1 Linear Algebra Problems

Theorems. Least squares regression

Total Least Squares Approach in Regression Methods

Singular Value Decomposition

STAT 7032 Probability Spring Wlodek Bryc

NONLINEAR PRINCIPAL COMPONENT ANALYSIS

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

1 Principal component analysis and dimensional reduction

Computational math: Assignment 1

Analysis of variance, multivariate (MANOVA)

ECON 4117/5111 Mathematical Economics

SVD, PCA & Preprocessing

UCLA Department of Statistics Papers

Least Squares Optimization

Thomas J. Fisher. Research Statement. Preliminary Results

The LIML Estimator Has Finite Moments! T. W. Anderson. Department of Economics and Department of Statistics. Stanford University, Stanford, CA 94305

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Appendix A: Matrices

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Estimation of Unique Variances Using G-inverse Matrix in Factor Analysis

STAT 100C: Linear models

Jianhua Z. Huang, Haipeng Shen, Andreas Buja

ECE 275A Homework #3 Solutions

Unsupervised Learning: Dimensionality Reduction

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Sparse PCA with applications in finance

Principal Components Theory Notes

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

1 Singular Value Decomposition and Principal Component

Multivariate Statistics

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

Least-Squares Rigid Motion Using SVD

Eigenvalues, Eigenvectors, and an Intro to PCA

Transcription:

MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS JAN DE LEEUW ABSTRACT. We study the weighted least squares fixed rank approximation problem in which the weight matrices depend on unknown parameters. The classical example is fixed score factor analysis (FSFA), where the weights depend on the unknown uniquenesses. 1. INTRODUCTION Factor Analysis (FA) is a class of techniques to approximate a data matrix by a matrix of the same dimension but of lower rank. In order words, we have an n m matrix X and an integer 1 p min(n,m), and we want to find an n p matrix A of factor scores and an m p matrix B of factor loadings such that X AB. The obvious translation into an optimization problem with a real-valued loss function is to minimize X AB over A and B, with some norm or semi-norm. The norm is usually unitarily invariant, and of course more often than not of the weighted least squares type. A fairly general treatment is in De Leeuw [1984], also see Zha [1991]. In Common Factor Analysis (CFA) we want, in addition, that the residuals X Y are approximately orthogonal by columns. Thus we want to find A and B such that X AB as well as offdiag{(x AB ) (X AB )} 0. Because we now want two things, instead of just one, the choice of the loss function becomes much less straightforward. But before we discuss some of the more common loss function alternatives below, we generalize CFA by replacing the requirement of orthogonality of the residuals by the more general requirement that (X AB ) (X AB ) is approximately in S m, where S m is a subset of the convex cone of positive semi-definite matrices of order m. The diagonal matrices defining CFA are just a special case of this. Date: Wednesday 14 th January, 2009 11h 4min Typeset in TIMES ROMAN. 1

2 JAN DE LEEUW 2. MULTINORMAL MAXIMUM LIKELIHOOD Consider the multinormal maximum likelihood problem (1a) Σ S m F(Y,Σ), where (1b) F (Y,Σ) = logdet(σ) + tr Σ 1 (X Y ) (X Y ), and where S m is a subset of the convex cone of positive definite matrices of order m. This is a straightforward generalization of maximum likelihood estimation in fixed score factor analysis as described by Young [1940]. In factor analysis the set S are the diagonal matrices. For that reasons we call the technique in which we can handle more general classes of weight matrices Generalized Fixed Score Factor Analysis or GFSFA. In FSFA, as Lawley [1942] found, implementation of a straightforward block relaxation algorithm, in which we alternate minimization over Y of rank p for fixed Σ and minimization over diagonal Σ for fixed Y, leads to an unbounded decreasing sequence of loss function values. Thus the minimum is not attained, and the minimizer, which would give the maximum likelihood estimate in the FSFA model, does not exist. An actual proof was given by Anderson and Rubin [1956, section 7.7]. In this short note we generalize the result to GFSFA, with more general sets S m. Theorem 2.1. Suppose Σ n S is a sequence of positive definite matrices of order m, converging to Σ 0, where Σ 0 has rank m 1. Suppose u is the normalized eigenvector satisfying Σ 0 u = 0 and assume Σ n u = λ n u for all n. Then if p > 0 we have lim min (Σ n,y ) =. Proof. Suppose u is the unique (up to sign) unit length vector in the null space of Σ 0. Because Σ 0 commutes with Σ n the vector u is also an eigenvector of Σ n. Set Y = Xuu so that X Y = X(I uu ) and (X Y ) (X Y ) = (I uu )X X(I uu ). Moreover rank(y ) 1 p. Then, by continuity the smallest eigenvalue and the

corresponding eigenvector, lim GENERALIZED FIXED FACTOR ANALYSIS 3 tr Σ 1 n (X Y ) (X Y ) = lim tr (I uu )Σ 1 n (I uu )X X = tr Σ + 0 X X, which is a finite non-negative number. On the other hand lim logdet(σ n) =. Example 2.1. For Anderson and Rubin s result we take a sequence of positive definite diagonal matrices for which exactly one diagonal element converges to zero. All other diagonal elements can be fixed at one. Clearly Theorem 2.1 also applies if S contains the set of all diagonal matrices, for example if S is the set of all positive definite tri-diagonal matrices. Example 2.2. Suppose S are the equi-correlation matrices, i.e. correlation matrices with all correlations equal to ρ. We can choose Σ 0 as the matrix with ρ = 1 n 1. This also proves, for example, that Theorem 2.1 applies to the symmetric positive definite Toeplitz correlation matrices, which has the equi-correlation matrices as a subset. Note that the condition in the Theorem is sufficient for the log likelihood to be unbounded, but it is not necessary. We have a similar result if rank(σ 0 ) = m 2 and p > 1. 3. REMEDIES 3.1. Smaller Covariance Structures. One obvious way around the problem that maximum likelihood estimates do not exist is to work with a smaller S m, which does not have a singular matrix of rank m 1 in its closure. In FSFA Young [1940] and Whittle [1952], for example, suggests to work with the scalar matrices σ 2 I. More generally, we could use σ 2 D, where D is any positive definite matrix. Jöreskog [1962], for example, suggests to choose D equal to the diagonal matrix with elements s 2 j (1 R2 j ), where s2 j is the sample variance of variable j and R 2 j is the multiple correlation of variable j with the remaining m 1 variables. In any case, we can find the maximum likelihood estimate of Y by computing the first p singular values and singular vectors of XD 2 1 and then compute the maximum likelihood estimate of σ 2 from the sum of squares of the m p smallest

4 JAN DE LEEUW singular values. Computations are trivial, compared to the complicated iterative procedures in the general case. In GFSFA we can replace the set of all Toeplitz matrices, for example, by the set of all matrices with elements σ i j = σ 2 ρ i j, where 1 < ρ < +1. These are special Toeplitz matrices, known as Kac-Murdoch-Szego matrices, after Kac et al. [1953], or AR(1) covariance matrices. They are nonsingular if σ 2 > 0, and the only nonzero singular matrices in the closure are the matrices with ρ = ±1, which are of rank 1. Alternatively we can also use constraints on the parameters that rule out degeneracies. In the equi-correlation case, for example, we may require ρ 0, which rules out the singular matrix that causes the problems. 3.2. Smaller Mean Structures. The constraint rank(y ) p is equivalent to Y = AB, where A and B are unknown matrices of dimension n p and m p. We can require, instead, that Y = AT B, with A and B known, or partially known, and T unknown. This is a Pothoff-Roy model, described in the context of growth curve analysis by Pothoff and Roy [1964]. Clearly there is a range of models here, going from A and B completely known (the regression situation) to A and B completely unknown (the principal component situation). The more structure we impose on Y, the less structure we have to impose on Σ, presumably. 3.3. Augmented Least Squares. 3.4. Random Scores. The log-likelihood so far has been based on the model of n i.i.d. random variables x i N (Ba i,σ). We use the convention of underlining random variables [Hemelrijk, 1966]. Alternatively we can assume that N (Ba i,σ) is the conditional distribution of x i given a i = a i, where the a i are i.i.d N (0,Ω). It follows that x i N (0,Σ+BΩB ). In this case the negative log-likelihood is R (B,Σ,Ω) = logdet(σ + BΩB ) + tr X(Σ + BΩB ) 1 X. This loss function can be written in different forms, using familiar matrix identities (see, for example, De Hoog et al. [1990] or De Leeuw and Meijer [2008, p. 65]).

GENERALIZED FIXED FACTOR ANALYSIS 5 tr X(Σ + BΩB ) 1 X { = min tr Σ 1 (X AB ) (X AB ) + A Ω 1 A } = A = tr X(Σ 1 Σ 1 B(Ω 1 + B Σ 1 B) 1 B Σ 1 ) 1 X. logdet(σ + BΩB ) = logdet(σ) + logdet(ω) + logdet(ω 1 + B Σ 1 B) = = logdet(σ) + logdet(b Σ 1 B) + logdet(ω + (B Σ 1 B) 1 ). 3.5. Maximum Likelihood Ratio. Suppose S m C m are two sets of positive definite matrices. The likelihood ratio test for testing S m within C m computes L p (X) = (Σ,Y ) Σ S m (Σ,Y ). Σ C m As we have seen in many cases both terms are equal to, the maximum likelihood estimates do not exist, and the test cannot be used. It was suggested by McDonald [1979], also see Etezadi-Amoli and McDonald [1983], to compute instead M p (X) = [ ] (Σ,Y ) (Σ,Y ). Σ S m Σ C m This amounts to computing maximum likelihood ratio estimates or MLR estimates. In the case of factor analysis we can choose S m as the diagonal matrices and C m as the set of all positive definite matrices. Then, with S(X Y ) = (X Y ) (X Y ), and thus the MLR loss is M p (X) = (Σ,Y ) = logdet(diag(s(x Y ))) + m, Σ S m (Σ,Y ) = logdet(s(x Y )) + m, Σ C m logdet(r(x Y )) = log { sup det(r(x Y )) where R(X Y ) is the correlation matrix of the residuals X Y. Thus we are maximizing the determinant of the correlation matrix of the residuals, which means making it as close to the identity as possible (in the determinant metric). This seems to be a perfectly respectable loss function for factor analysis, especially because logdet(r(x Y )) 0, which means the MLR loss function is bounded from below. },

6 JAN DE LEEUW 3.6. Noncentral Wishart. Anderson and Rubin [1956, section 11] REFERENCES T.W. Anderson and H. Rubin. Statistical Inference in Factor Analysis. In J. Neyman, editor, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 5, pages 111 150. University of California Press, 1956. F.R. De Hoog, T.P. Speed, and E.R. Williams. On a Matrix Indentity Associated with Generalized Least Squares. Linear Algebra and Its Appplications, 127: 449 456, 1990. J. De Leeuw. Fixed-rank Approximation with Singular Weight Matrices. Computational Statistics Quarterly, 1:3 12, 1984. J. De Leeuw and E. Meijer, editors. Handbook of Multilevel Analysis. Springer, 2008. J. Etezadi-Amoli and R.P. McDonald. A Second Generation Nonlinear Factor Analysis. Psychometrika, 48:315 342, 1983. J. Hemelrijk. Underlining random variables. Statistica Neerlandica, 20:1 7, 1966. K.G. Jöreskog. On the Statistical Treatment of Residuals in Factor Analysis. Psychometrika, 27(4):335 354, 1962. M. Kac, W.L. Murdock, and G. Szegö. On the Eigenvalues of Certain Hermitian Forms. Journal of Rational Mechanics and Analysis, 2:767 800, 1953. D.N. Lawley. Further Investigations in Factor Estimation. Proceedings of the Royal Society of Edinburgh, 61:176 185, 1942. R.P. McDonald. The Simultaneous Estimation of Factor Loadings and Scores. British Journal of Mathematical and Statistical Psychology, 32:212 228, 1979. R. F. Pothoff and S. N. Roy. A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems. Biometrika, 51:313 326, 1964. P. Whittle. On Principal Components and Least Squares Methods in Factor Analysis. Skandinavisk Aktuarietidskrift, 35:223 239, 1952. G. Young. Maximum Likelihood Estimation and Factor Analysis of Data Matrices. Psychometrika, 6(1):49 53, 1940. H. Zha. The Product-product Singular Value Decomposition of Matrix Triplets. BIT, 31:711 726, 1991.

GENERALIZED FIXED FACTOR ANALYSIS 7 DEPARTMENT OF STATISTICS, UNIVERSITY OF CALIFORNIA, LOS ANGELES, CA 90095-1554 E-mail address, Jan de Leeuw: deleeuw@stat.ucla.edu URL, Jan de Leeuw: http://gifi.stat.ucla.edu