Number of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations

Similar documents
Number of analysis cases (objects) n n w Weighted number of analysis cases: matrix, with wi

Number of analysis cases (objects) n n w Weighted number of analysis cases: matrix, with wi

Chapter 2 Nonlinear Principal Component Analysis

over the parameters θ. In both cases, consequently, we select the minimizing

NONLINEAR PRINCIPAL COMPONENT ANALYSIS

Principal Component Analysis for Mixed Quantitative and Qualitative Data

Evaluating Goodness of Fit in

Two-stage acceleration for non-linear PCA

Least squares multidimensional scaling with transformed distances

Math 3191 Applied Linear Algebra

A Goodness-of-Fit Measure for the Mokken Double Monotonicity Model that Takes into Account the Size of Deviations

THE GIFI SYSTEM OF DESCRIPTIVE MULTIVARIATE ANALYSIS

Parallel Singular Value Decomposition. Jiaxing Tan

An Introduction to Multivariate Statistical Analysis

Chapter 3 Transformations

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

Correspondence Analysis of Longitudinal Data

Principal components based on a subset of qualitative variables and its accelerated computational algorithm

Chapter 7. Least Squares Optimal Scaling of Partially Observed Linear Systems. Jan DeLeeuw

Multivariate Statistical Analysis

MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS 1. INTRODUCTION

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

A GENERALIZATION OF TAKANE'S ALGORITHM FOR DEDICOM. Yosmo TAKANE

REGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS 1. MORALS

Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices

Solving large scale eigenvalue problems

ALGORITHM CONSTRUCTION BY DECOMPOSITION 1. INTRODUCTION. The following theorem is so simple it s almost embarassing. Nevertheless

5 Linear Algebra and Inverse Problem

Singular Value Decomposition

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Mathematical foundations - linear algebra

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.

1 Cricket chirps: an example

A User's Guide To Principal Components

The Singular Value Decomposition and Least Squares Problems

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

7. Symmetric Matrices and Quadratic Forms

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

WHEN MODIFIED GRAM-SCHMIDT GENERATES A WELL-CONDITIONED SET OF VECTORS

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

Problem # Max points possible Actual score Total 120

EECS 275 Matrix Computation

We will discuss matrix diagonalization algorithms in Numerical Recipes in the context of the eigenvalue problem in quantum mechanics, m A n = λ m

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Recall the convention that, for us, all vectors are column vectors.

Lecture 5 Singular value decomposition

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

1 Singular Value Decomposition and Principal Component

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Factor Analysis of Data Matrices

Procedia - Social and Behavioral Sciences 109 ( 2014 )

Chap 3. Linear Algebra

Methods for sparse analysis of high-dimensional data, II

QR Decomposition. When solving an overdetermined system by projection (or a least squares solution) often the following method is used:

IDL Advanced Math & Stats Module

Assignment #9: Orthogonal Projections, Gram-Schmidt, and Least Squares. Name:

Exercise Set 7.2. Skills

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

COMP 558 lecture 18 Nov. 15, 2010

Preprocessing & dimensionality reduction

Linear Algebra: Matrix Eigenvalue Problems

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

UNIT 6: The singular value decomposition.

PRACTICE FINAL EXAM. why. If they are dependent, exhibit a linear dependence relation among them.

Methods for sparse analysis of high-dimensional data, II

7 Principal Component Analysis

Sparse orthogonal factor analysis

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

QUADRATIC MAJORIZATION 1. INTRODUCTION

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION

MULTILEVEL IMPUTATION 1

1 Linear Algebra Problems

Robustness of Principal Components

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Section 6.4. The Gram Schmidt Process

Assessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap and Alternative Simulation Methods.

Linear Models Review

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Seminar on Linear Algebra

Matrix Differential Calculus with Applications in Statistics and Econometrics

Math 2331 Linear Algebra

Orthogonalization and least squares methods

Krylov Subspaces. Lab 1. The Arnoldi Iteration

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

STATISTICAL LEARNING SYSTEMS

Section 6.2, 6.3 Orthogonal Sets, Orthogonal Projections

NONLINEAR PRINCIPAL COMPONENT ANALYSIS AND RELATED TECHNIQUES. Principal Component Analysis (PCA from now on) is a multivariate data

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Linear Algebra problems

Lecture Notes 1: Vector spaces

Eigenvalues and diagonalization

Numerically Stable Cointegration Analysis

Transcription:

PRINCALS Notation The PRINCALS algorithm was first described in Van Rickevorsel and De Leeuw (1979) and De Leeuw and Van Rickevorsel (1980); also see Gifi (1981, 1985). Characteristic features of PRINCALS are the ability to specify any of a number of measurement levels for each variable separately and the treatment of missing values by setting weights in the loss function equal to 0. The following notation is used throughout this chapter unless otherwise noted: n m p Number of cases (obects) Number of variables Number of dimensions For variable, h = 1, K, m n-vector with categorical observations k Number of categories (distinct values) of variable G Indicator matrix for variable, of order n k The elements of G are defined as i = 1, K, n; r = 1, K, k 16 = % & ' 1 g ir 0 when the ith obect is in the rth category of variable when the ith obect is not in the rth category of variable M Binary, diagonal n n matrix The diagonal elements of M are i k m16 % 1 when the th observation is within the range [ 1, ] ii= & ' 0 when the ith observation is outside the range [ 1, k ] D Diagonal matrix containing the univariate marginals; that is, the column sums of G 1

2 PRINCALS The quantification matrices and parameter vectors are: X Obect scores, of order n p Y Multiple category quantifications, of order k p y a Single category quantifications, of order k Variable weights (equal to component loadings) of order p Q Transformed data matrix of order n m with columns q = G y Y Collection of multiple and single category quantifications Note: The matrices G, M, and D are exclusively notational devices; they are stored in reduced form, and the program fully profits from their sparseness by replacing matrix multiplications with selective accumulation. Obective Function Optimization The PRINCALS obective is to find obect scores X and a set of Y (for = 1, K, m) the underlining indicates that they are possibly restricted so that the function 1 6 4 9 4 9 σ XY ; = 1 m tr X G Y M X GY is minimal, under the normalization restriction XM X = M = M mn I, where the matrix, and I is the p p identity matrix. The inclusion of M in σ1xy ; 6 ensures that there is no influence of data values outside the range [ 1, k ], which may be really missing or merely regarded as such; M contains the number of active data values for each obect. The obect scores are also centered; that is, they satisfy um X = 0, with u denoting an n-vector with ones.

PRINCALS 3 The following measurement levels are distinguished in PRINCALS: Multiple Nominal Y = Y (unrestricted) Single Nominal Y = y a (rank-one restrictions only) (Single) Ordinal Y = y a and y C (rank-one and montonicity restrictions) (Single) Numerical Y = y a and y L (rank-one and linearity restrictions) For each variable, these levels can be chosen independently. The general requirement in the single options is Y = y a ; that is, Y is of rank one; for identification purposes, y is always normalized so that y Dy = n, which implies that the variance of the quantified variable q = G y is 1. In the ordinal case, the additional restriction y C means that y must be located in the convex cone of all k -vectors with nondecreasing elements. In the numerical case, the additional restriction y L means that y must be located in the subspace of all k -vectors that are a linear transformation of the vector consisting of k successive integers. Optimization is achieved through the following iteration scheme twice: (1) Initialization (a) or (b) (2) Update obect scores

4 PRINCALS (3) Orthonormalization (4) Update category quantifications (5) Convergence test: repeat (2) (4) or continue (6) Rotation The first time (for the initial configuration) initialization (a) described below is used and all single variables are temporarily treated as single numerical. Multiple nominal variables are treated as multiple nominal. The second time (for the final configuration) initialization (b) is used. Steps (1) through (6) are explained below. (1) Initialization (a) The obect scores X are initialized with random numbers, which are normalized so that um X = 0 and XM X = mn I, yielding X ~. For multiple variables, the initial category quantifications are obtained as ~ 1 Y D G X ~ =. For single variables, the initial category quantifications are defined as the first k successive integers, normalized so that ud ~ y = 0 and ~ ydy ~ = n, and the initial variable weights are calculated as the vector ~ ~ a X G ~ = y, rescaled to unit length. (b) All relevant quantities are copied from the results of the first cycle. (2) Update obect scores First, the auxiliary score matrix Z is computed as Z ~ M G Y and centered with respect to M : ~ Z M M uu M u M u Z 1 6 < A. These two steps yield locally the best updates when there are no orthogonality constraints. (3) Orthonormalization The orthonormalization problem is to find an M -orthonormal X + that is closest to Z ~ in the least squares sense. In PRINCALS, this is done by setting + 12 X M 12 m M 12~ GRAM Z 4 9 which is equal to the genuine least squares estimate up to a rotation see (6). The notation GRAM( ) is used to denote the Gram-Schmidt transformation (Börk and Golub, 1973).

PRINCALS 5 (4) Update category quantifications; loop across variable = 1, K, m (a) For multiple nominal variables, the new category quantifications are computed as: + 1 ~ Y = D G X (b) For single variables, first an unconstrained update is computed in the same way: ~ 1 + Y = D G X Next, one cycle of an ALS algorithm (De Leeuw et al., 1976) is executed for computing a rank-one decomposition of Y ~, with restrictions on the left-hand vector. This cycle starts from the previous single quantifications ~ y with a + ~ = Y D ~ y When the current variable is numerical, we are ready; otherwise we compute y = Y ~ a + + Now, when the current variable is single nominal you can simply obtain y by normalizing y in the way indicated below; otherwise the variable must be ordinal, and you have to insert the weighted monotonic regression process y WMON4y 9 which make y monotonically increasing. The weights used are the diagonal elements of D, and the subalgorithm used is the up-and-down-blocks minimum violators algorithm (Kruskal, 1964; Barlow et al., 1972). The result is normalized: 4 9 + 12 12 y = n y y D y Finally, we set Y + = y + a +

6 PRINCALS (5) Convergence test The difference between consecutive values of the quantity 16 16! TFIT = 1 m y sd y s+ a a s J J where y16 s denotes the sth column of Y and where J is an index set recording which variables are multiple, is compared with the user-specified convergence criterion e a small positive number. It can be shown that TFIT = p σ 1X; Y6. Steps (2) through (4) are repeated as long as the loss difference exceeds ε. (6) Rotation As remarked in (3), during iteration the orientation of X and Y with respect to the coordinate system is not necessarily correct; this also reflects that σ1xy ; 6 is invariant under simultaneous rotations of X and Y. From the theory of principal components, it is known that if all variables would be single, the matrix A which can be formed by stacking the row vectors a has the property that AA is diagonal. Therefore you can rotate so that the matrix 1 m AA = 1 m a a = 1 m YD Y " # $ # Diagnostics becomes diagonal. The corresponding eigenvalues are printed after the convergence message of the program. The calculation involves tridiagonalization with Householder transformations followed by the implicit QL algorithm (Wilkinson, 1965). Maximum Rank (may be issued as a warning when exceeded) The maximum rank p max indicates the maximum number of dimensions that can be computed for any data set. In general

PRINCALS 7 % ( &K 1 6 2 2 1 1 267 J )K, 'K *K pmax = min n 1, k + m max m, max, m 0 1 where m 1 is the number of multiple variables with no missing values, m 2 is the number of single variables, and J is an index set recording which variables are multiple. Although the number of nontrivial dimensions may be less than p max when m = 2, PRINCALS does allow dimensionalities all the way up to p max. When, due to empty categories in the actual data, the rank deteriorates below the specified dimensionality, the program stops. Marginal Frequencies The frequencies table gives the univariate marginals and the number of missing values (that is, values that are regarded as out of range for the current analysis) for each variable. These are computed as the column sums of D and the total sum of M. Fit and Loss Measures When the HISTORY option is in effect, the following fit and loss measures are reported: (a) Total fit. This is the quantity TFIT defined in (5). (b) Total loss. This is σ XY ; loss defined below. (c) Multiple loss. This measure is computed as TMLOSS = p 1 m tr YD Y 1 6, computed as the sum of multiple loss and single (d) Single loss. This measure is computed only when some of the variables are single: SLOSS = 1 m tr YD Y + a a J J

8 PRINCALS Eigenvalues and Correlations between Optimally Scaled Variables References If there are no missing data, the eigenvalues printed by PRINCALS are those of 1 mr1q6, where RQ 1 6 denotes the matrix of correlations between the optimally scaled variables in the columns of Q. For multiple variables, q is defined here as 1 6 itself is also printed. G y161. When all variables are single or when p = 1, RQ If there are missing data, then the eigenvalues are those of the matrix with elements qm 1 q 1, which is not necessarily a correlation matrix, although it is positive semidefinite. Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. 1972. Statistical inference under order restrictions. New York: John Wiley & Sons, Inc. Börk, A., and Golub, G. H. 1973. Numerical methods for computing angles between linear subspaces. Mathematics of Computation, 27: 579 594. De Leeuw, J., and Van Rickevorsel, J. 1980. HOMALS and PRINCALS Some generalizations of principal components analysis. In: Data Analysis and Informatics, E. Diday et al, eds. Amsterdam: North-Holland. De Leeuw, J., Young, F. W., and Takane, Y. 1976. Additive structure in qualitative data: an alternating least squares method with optimal scaling features. Psychometrika, 31: 33 42. Gifi, A. 1981. Nonlinear multivariate analysis. Leiden: Department of Data Theory. Gifi, A. 1985. PRINCALS. Leiden: Department of Data Theory, Internal Report UG-85 02. Kruskal, J. B. 1964. Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29: 115 129. Van Rickevorsel, J., and De Leeuw, J. 1979. An outline of PRINCALS. Leiden: Internal Report RB 002 79, Department of Data Theory.

PRINCALS 9 Wilkinson, J. H. 1965. The algebraic eigenvalue problem. Oxford: Clarendon Press.