REGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS 1. MORALS

Similar documents
MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS 1. INTRODUCTION

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

Chapter 2 Nonlinear Principal Component Analysis

CENTERING IN MULTILEVEL MODELS. Consider the situation in which we have m groups of individuals, where

ALGORITHM CONSTRUCTION BY DECOMPOSITION 1. INTRODUCTION. The following theorem is so simple it s almost embarassing. Nevertheless

A L T O SOLO LOWCLL. MICHIGAN, THURSDAY. DECEMBER 10,1931. ritt. Mich., to T h e Heights. Bos" l u T H I S COMMl'NiTY IN Wilcox

Number of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations

over the parameters θ. In both cases, consequently, we select the minimizing

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION

LOWELL WEEKLY JOURNAL

A GENERALIZATION OF TAKANE'S ALGORITHM FOR DEDICOM. Yosmo TAKANE

LEAST SQUARES METHODS FOR FACTOR ANALYSIS. 1. Introduction

AN ITERATIVE MAXIMUM EIGENVALUE METHOD FOR SQUARED DISTANCE SCALING 1. INTRODUCTION

LEAST SQUARES ORTHOGONAL POLYNOMIAL COMPONENT ANALYSIS IN R. 1. Introduction

EXPLICIT EXPRESSIONS OF PROJECTORS ON CANONICAL VARIABLES AND DISTANCES BETWEEN CENTROIDS OF GROUPS. Haruo Yanai*

Evaluating Goodness of Fit in

Multidimensional Scaling in R: SMACOF

UCLA Department of Statistics Papers

Principal component analysis

Multilevel Analysis, with Extensions

MARGINAL REWEIGHTING IN HOMOGENEITY ANALYSIS. φ j : Ω Γ j.

FIXED-RANK APPROXIMATION WITH CONSTRAINTS

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis

NONLINEAR PRINCIPAL COMPONENT ANALYSIS AND RELATED TECHNIQUES. Principal Component Analysis (PCA from now on) is a multivariate data

Asymptotic Pure State Transformations

Principal components based on a subset of qualitative variables and its accelerated computational algorithm

NONLINEAR PRINCIPAL COMPONENT ANALYSIS

General structural model Part 1: Covariance structure and identification. Psychology 588: Covariance structure and factor models

Data set Science&Environment (ISSP,1993) Multiple categorical variables. Burt matrix. Indicator matrix V4a(A) Multiple Correspondence Analysis

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3

Systems of linear equations

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 11

Homogeneity Analysis in R: The Package homals

Statistics for Social and Behavioral Sciences

Department of Statistics, UCLA UC Los Angeles

History of Nonlinear Principal Component Analysis

Canonical Correlations

LOWELL WEEKLY JOURNAL.

Chapter 7. Least Squares Optimal Scaling of Partially Observed Linear Systems. Jan DeLeeuw

UCLA Department of Statistics Papers

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

Thus ρ is the direct sum of three blocks, where block on T 0 ρ is the identity, on block T 1 it is identically one, and on block T + it is non-zero

MP 472 Quantum Information and Computation

Introduction to Quantum Information Hermann Kampermann

Consideration on Sensitivity for Multiple Correspondence Analysis

Least squares multidimensional scaling with transformed distances

A Multivariate Perspective

Chapter 11 Canonical analysis

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Entanglement Measures and Monotones

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Experimental Design and Data Analysis for Biologists

MAJORIZATION OF DIFFERENT ORDER

QUADRATIC MAJORIZATION 1. INTRODUCTION

Correspondence Analysis of Longitudinal Data

The residual again. The residual is our method of judging how good a potential solution x! of a system A x = b actually is. We compute. r = b - A x!

A Least Squares Formulation for Canonical Correlation Analysis

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

Simple and Canonical Correspondence Analysis Using the R Package anacor

Nesting and Equivalence Testing

Two-stage acceleration for non-linear PCA

Quantum state discrimination with post-measurement information!

Principal Component Analysis

Chapter 2 Canonical Correlation Analysis

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

THE GIFI SYSTEM OF DESCRIPTIVE MULTIVARIATE ANALYSIS

Principal Component Analysis for Mixed Quantitative and Qualitative Data

A General Framework for Multivariate Analysis with Optimal Scaling: The R Package aspect

Exam 2. Jeremy Morris. March 23, 2006

Multilevel Component Analysis applied to the measurement of a complex product experience

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

UCLA Department of Statistics Papers

REDUNDANCY ANALYSIS AN ALTERNATIVE FOR CANONICAL CORRELATION ANALYSIS ARNOLD L. VAN DEN WOLLENBERG UNIVERSITY OF NIJMEGEN

Ensembles and incomplete information

Multivariate analysis of variance and covariance

MULTIVARIATE CUMULANTS IN R. 1. Introduction

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

Categorical Predictor Variables

Classification: Linear Discriminant Analysis

1 Data Arrays and Decompositions

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

SOME ADDITIONAL RESULTS ON PRINCIPAL COMPONENTS ANALYSIS OF THREE-MODE DATA BY MEANS OF ALTERNATING LEAST SQUARES ALGORITHMS. Jos M. F.

Chapter 8: Regression Models with Qualitative Predictors

Vectors and Matrices Statistics with Vectors and Matrices

GENERALIZED EIGENVALUE PROBLEMS WITH POSITIVE SEMI-DEFINITE MATRICES JAN DE LEEUW UNIVERSITY OF LEIDEN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

On V-orthogonal projectors associated with a semi-norm

Two Posts to Fill On School Board

STATISTICAL LEARNING SYSTEMS

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 11 Offprint. Discriminant Analysis Biplots

Lecture 4: Purifications and fidelity

..«W- tn^zmxmmrrx/- NEW STORE. Popular Goods at Popular D. E. SPRING, Mas just opened a large fdo.k of DRY GOODS & GROCERIES,

Quantum Nonlocality Pt. 4: More on the CHSH Inequality

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

A General Framework for Multivariate Analysis with Optimal Scaling: The R Package aspect

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

LOWELL. MICHIGAN, OCTOBER morning for Owen J. Howard, M last Friday in Blodpett hospital.

INFORMATION THEORY AND STATISTICS

Transcription:

REGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS JAN DE LEEUW ABSTRACT. It is shown that the homals package in R can be used for multiple regression, multi-group discriminant analysis, and canonical correlation analysis. The homals solutions are only different from the more conventional ones in the way the dimensions are scaled by the eigenvalues. 1. MORALS Suppose we have m+1 variables, with the first m being predictors (or independent variables), and the last one the outcome (or dependent variable). In homals De Leeuw and Mair, 2009 we use ndim=1,sets=list(1:m,m+1),rank=1 which means the loss function looks like σ(x,a,q) = (x a m+1 q m+1 ) (x a m+1 q m+1 ) + (x m a j q j ) m (x a j q j ) j=1 j=1 with q j the quantified or transformed variables. This must be minimized over a,x,q under the conditions that u x = u q j = 0 and x x = q j q j = 1, and of course that q j K j, the appropriate set of admissible transformations. Write Q = q 1 q m and b = (a 1,,a m ). Also write s = a m+1 and y = q m+1. Then σ(x,a,q) = (x sy) (x sy) + (x Qb) (x Qb). It folllows that s = x y, b = (Q Q) 1 Q x, Date: Sunday 26 th April, 2009 1h 8min Typeset in TIMES ROMAN. 1

2 JAN DE LEEUW as well as x = sy + Qb sy + Qb. Also x is the normalized eigenvector corresponding with the largest eigenvalue of K = yy + P, where P = Q(Q Q) 1 Q. But the non-zero eigenvalues of K are the squares of the non-zero singular values of y Q(Q Q) 1 2 and these are the same as the non-zero eigenvalues of H = 1 y Q(Q Q) 1 2 (Q Q) 1 2 Q y I Define the usual regression quantities β = (Q Q) 1 Q y and ρ 2 = y Q(Q Q) 1 Q y. The eigenvalues of H are 1+ρ, 1 ρ, and 1 with multiplicity m 1. An eigenvector corresponding with the dominant eigenvalue is ρ (Q Q) 1 2 Q y It follows that an eigenvector corresponding with the dominant eigenvalue of K is (Q(Q Q) 1 Q + ρi)y, and Thus x = 1 ρ 2(1 + ρ) (Q(Q Q) 1 Q + ρi)y. b = 1 1 + ρ ρ 2 β, 1 + ρ s = 2. The vector of regression coefficient β is thus proportional to b, and the two are identical if and only if ρ = 1. The minimum loss function value is 1 ρ. Thus, ultimately, we find tranformations q j of the variables in such a way that the multiple correlation is maximized..

MORALS,CRIMINALS,CANALS 3 2. CRIMINALS Again we have m + 1 variables, with the first m being predictors and the last one the outcome. But now the outcome is a categorical variable with k categories. In homals we use ndim=p,sets=list(1:m,m+1),rank=c(rep(1, m),p) where p < k. The loss function is σ(x,a,q,y ) = tr (X GY ) (X GY ) + tr (X QA) (X QA), where G is the indicator matrix of the outcome, and where we require u X = u Q = 0 and X X = diag(q Q) = I. Now we must have at the minimum Y = (G G) 1 G X, A = (Q Q) 1 Q X. Thus X are the normalized eigenvectors corresponding with the p largest eigenvalues of K = G(G G) 1 G + Q(Q Q) 1 Q. And X also are the normalized left singular vectors of G(G G) 1 2 Q(Q Q) 1 2. We can find the right singular vectors as the eigenvectors of I (G G) 2 1 G Q(Q Q) 1 2 H = (Q Q) 2 1 Q G(G G). 2 1 I Now let UΨV be the singular value decomposition of (G G) 2 1 G Q(Q Q) 2 1. Then U are the eigenvectors of H corresponding with the largest eigenvalues I + Ψ. V Take the eigenvectors U p V p corresponding with the p largest singular values Ψ p. The corresponding left singular vectors are X = G(G G) 1 2U p +Q(Q Q) 1 2V p. Because X X = 2(I + Ψ p ) we find Thus X = 2 1 2 (G(G G) 1 2 Up + Q(Q Q) 1 2 Vp )(I + Ψ p ) 1 2. Y = 2 1 2 (G G) 1 2 Up (I + Ψ p ) 1 2, A = 2 1 2 (Q Q) 1 2 Vp (I + Ψ p ) 1 2, and X = (GY + QA)(I + Ψ p ) 1.

4 JAN DE LEEUW Also note that Y G GY = A Q QA = 1 2 (I +Ψ p), while Y G QA = 1 2 Ψ p(i +Ψ p ). The minimum value of the loss function is p tr Ψ p. Now let us compare these computations with the usual canonical discriminant analysis. There we compute the projector P = G(G G) 1 G and the between-groups dispersion matrix B = Q PQ and we solve the generalized eigenvalue problem BZ = T ZΛ, where T = Q Q is the total dispersion. The problem is normalized by setting Z T Z = I. Thus, using the p largest eigenvalues, Q G(G G) 1 G QZ p = Q QZ p Λ p. This immediately gives Λ p = Ψ 2 p. Also (Q Q) 1 2 Z p =V p or Z p = 2A(I + Ψ p ) 2 1. For the group means M p = (G G) 1 G QZ p we find M p = 2Y (I +Ψ p ) 2 1. Thus both Z p and M p are simple rescalings of A and Y. homals find the transformations of the variables that maximizes the sum of the p largest singular values of (G G) 2 1 G Q(Q Q) 2 1. 3. CANALS Canonical correlation analysis with homals has m1 + m2 variables, and we use ndim=p,sets=list(1:m1,m1+(1:m2)),rank=c(rep(1,m1+m2). The loss is σ(x,a,q) = tr (X Q 1 A 1 ) (X Q 1 A 1 ) + tr (X Q 2 A 2 ) (X Q 2 A 2 ). Since our analysis of discriminant analysis in homals never actually used the fact that G was an indicator, the results are exactly the same as in the previous section (with the obvious substitutions). In classical canonical correlation analysis the function tr R Q 1 Q 2S is maximized over R Q 1 Q 1 R = I and S Q 2 Q 2S = I. This means solving Q 1Q 2 S = Q 1Q 1 RΦ, Q 2Q 1 R = Q 2Q 2 SΦ. From homals, as before, A 1 = 2 1 2 (Q 1 Q 1 ) 1 2 Up (I + Ψ p ) 1 2, A 2 = 2 1 2 (Q 2 Q 2 ) 1 2 Vp (I + Ψ p ) 1 2.

MORALS,CRIMINALS,CANALS 5 In canonical analysis Φ = Ψ and R = (Q 1Q 1 ) 2 1 Up = 2A 1 (I + Ψ p ) 2 1, S = (Q 2Q 2 ) 1 2 Vp = 2A 2 (I + Ψ p ) 2 1. Again we see the same type of rescaling of the canonical weights. Note that homals does not find the transformations that maximize the sum of the squared canonical correlations, which is the target function in the original CANALS approach Young et al., 1976; Van Der Burg and De Leeuw, 1983. Maximizing the square of the canonical correlations means maximizing a different aspect of the correlation matrix De Leeuw, 1988, 1990. REFERENCES J. De Leeuw. Multivariate Analysis with Linearizable Regressions. Psychometrika, 53:437 454, 1988. J. De Leeuw. Multivariate Analysis with Optimal Scaling. In S. Das Gupta and J. Sethuraman, editors, Progress in Multivariate Analysis, Calcutta, India, 1990. Indian Statistical Institute. J. De Leeuw and P. Mair. Homogeneity Analysis in R: the Package homals. Journal of Statistical Software, (in press), 2009. E. Van Der Burg and J. De Leeuw. Non-linear Canonical Correlation. British Journal of Mathematical and Statistical Psychology, 36:54 80, 1983. F. W. Young, J. De Leeuw, and Y. Takane. Regression with Qualitative and Quantitative Data: and Aletrnating Least Squares Approach with Optimal Scaling Features. Psychometrika, 41:505 529, 1976. DEPARTMENT OF STATISTICS, UNIVERSITY OF CALIFORNIA, LOS ANGELES, CA 90095-1554 E-mail address, Jan de Leeuw: deleeuw@stat.ucla.edu URL, Jan de Leeuw: http://gifi.stat.ucla.edu