DYNAMIC AND COMPROMISE FACTOR ANALYSIS

Similar documents
Econ 424 Time Series Concepts

DATABASE AND METHODOLOGY

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Linear Algebra (Review) Volker Tresp 2017

ANOVA: Analysis of Variance - Part I

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

PCA, Kernel PCA, ICA

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

STA 414/2104: Machine Learning

Linear Algebra (Review) Volker Tresp 2018

3 (Maths) Linear Algebra

Singular value decomposition (SVD) of large random matrices. India, 2010

Linear Methods for Prediction

Principal component analysis

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Generalized Autoregressive Score Models

STA 4273H: Statistical Machine Learning

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

[y i α βx i ] 2 (2) Q = i=1

STAT 100C: Linear models

Chapter 4: Factor Analysis

Notes on Time Series Modeling

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

R = µ + Bf Arbitrage Pricing Model, APM

Identifying the Monetary Policy Shock Christiano et al. (1999)

6-1. Canonical Correlation Analysis

CS281 Section 4: Factor Analysis and PCA

New Introduction to Multiple Time Series Analysis

Section 3.2. Multiplication of Matrices and Multiplication of Vectors and Matrices

Linear Methods for Prediction

Properties of Matrices and Operations on Matrices

Lecture 6. Numerical methods. Approximation of functions

Fundamentals of Engineering Analysis (650163)

Pre-School Linear Algebra

1 Positive definiteness and semidefiniteness

Multivariate ARMA Processes

Hands-on Matrix Algebra Using R

7. Symmetric Matrices and Quadratic Forms

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Matrices, vectors and scalars. Operations. Matrix multiplication. notes linear algebra 1. Consider the following examples. a 11 a 12 a 21 a 22

Advanced Introduction to Machine Learning CMU-10715

Matrix Algebra Review

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Discriminant analysis and supervised classification

September Math Course: First Order Derivative

Chapter 3 - Temporal processes

SVD, discrepancy, and regular structure of contingency tables

Thus necessary and sufficient conditions for A to be positive definite are:

Principal Component Analysis

Course topics (tentative) The role of random effects

Statistics 910, #5 1. Regression Methods

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

ELEMENTARY LINEAR ALGEBRA

Lecture 13: Simple Linear Regression in Matrix Format

Econ Slides from Lecture 7

Chapter 3 Transformations

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

9.1 Orthogonal factor model.

Principal component analysis

Vector error correction model, VECM Cointegrated VAR

Multivariate Statistical Analysis

Vector autoregressions, VAR

ECON 4160, Lecture 11 and 12

5: MULTIVARATE STATIONARY PROCESSES

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Vector Auto-Regressive Models

Multivariate Regression

Reliability and Risk Analysis. Time Series, Types of Trend Functions and Estimates of Trends

ECON 4160, Spring term Lecture 12

14 Singular Value Decomposition

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Time Series 2. Robert Almgren. Sept. 21, 2009

PANEL DISCUSSION: THE ROLE OF POTENTIAL OUTPUT IN POLICYMAKING

VAR Models and Applications

Review of Optimization Methods

CS 195-5: Machine Learning Problem Set 1

Classification-relevant Importance Measures for the West German Business Cycle

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Lecture Notes 1: Vector spaces

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Principal Component Analysis

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

7. MULTIVARATE STATIONARY PROCESSES

Basics of Multivariate Modelling and Data Analysis

Neural Network Training

Appendix 5. Derivatives of Vectors and Vector-Valued Functions. Draft Version 24 November 2008

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Review (Probability & Linear Algebra)

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Linear Algebra and Matrices

Computational Linear Algebra

7 Principal Component Analysis

Introduction to Machine Learning

Transcription:

DYNAMIC AND COMPROMISE FACTOR ANALYSIS Marianna Bolla Budapest University of Technology and Economics marib@math.bme.hu Many parts are joint work with Gy. Michaletzky, Loránd Eötvös University and G. Tusnády, Rényi Institute, Hung. Acad. Sci. March 5, 2009

Motivation Having multivariate time series, e.g., financial or economic data observed at regular time intervals, we want to describe the components of the time series with a smaller number of uncorrelated factors. The usual factor model of multivariate analysis cannot be applied immediately as the factor process also varies in time. There is a dynamic part, added to the usual factor model, the auto-regressive process of the factors. Dynamic factors can be identified with some latent driving forces of the whole process. Factors can be identified only by the expert (e.g., monitary factors).

Motivation Having multivariate time series, e.g., financial or economic data observed at regular time intervals, we want to describe the components of the time series with a smaller number of uncorrelated factors. The usual factor model of multivariate analysis cannot be applied immediately as the factor process also varies in time. There is a dynamic part, added to the usual factor model, the auto-regressive process of the factors. Dynamic factors can be identified with some latent driving forces of the whole process. Factors can be identified only by the expert (e.g., monitary factors).

Motivation Having multivariate time series, e.g., financial or economic data observed at regular time intervals, we want to describe the components of the time series with a smaller number of uncorrelated factors. The usual factor model of multivariate analysis cannot be applied immediately as the factor process also varies in time. There is a dynamic part, added to the usual factor model, the auto-regressive process of the factors. Dynamic factors can be identified with some latent driving forces of the whole process. Factors can be identified only by the expert (e.g., monitary factors).

Motivation Having multivariate time series, e.g., financial or economic data observed at regular time intervals, we want to describe the components of the time series with a smaller number of uncorrelated factors. The usual factor model of multivariate analysis cannot be applied immediately as the factor process also varies in time. There is a dynamic part, added to the usual factor model, the auto-regressive process of the factors. Dynamic factors can be identified with some latent driving forces of the whole process. Factors can be identified only by the expert (e.g., monitary factors).

Remarks The model is applicable to weakly stationary (covariance-stationary) multivariate processes. The first descriptions of the model is found in J. F. Geweke, International Economic Review 22 (1977) and in Gy. Bánkövi et. al., Zeitschrift für Angewandte Mathematik und Mechanik 63 (1981). Since then, the model has been developed in such a way that dynamic factors can be extracted not only sequentially, but at the same time. For tis purpose we had to solve the problem of finding extrema of inhomogeneous quadratic forms in Bolla et. al., Lin. Alg. Appl. 269 (1998).

The model The input data are n-dimensional observations y(t) = (y 1 (t),..., y n (t)), where t is the time and the process is observed at discrete moments between two limits (t = t 1,..., t 2 ). For given positive integer M < n we are looking for uncorrelated factors F 1 (t),..., F M (t) such that they satisfy the following model equations: 1. As in the usual linear model, F m (t) = n b mi y i (t), t = t 1,..., t 2 ; m = 1,..., M. (1) i=1

2. The dynamic equation of the factors: ˆF m (t) = c m0 + L c mk F m (t k), t = t 1 +L,..., t 2 ; m = 1,..., M, k=1 (2) where the time-lag L is a given positive integer and ˆF m (t) is the auto-regressive prediction of the mth factor at date t (the white-noise term is omitted, therefore we use ˆF m instead of F m ).

3. The linear prediction of the variables by the factors as in the usual factor model: ŷ i (t) = d 0i + M d mi F m (t), t = t 1,..., t 2 ; i = 1,..., n. (3) m=1 (The error term is also omitted, that is why we use the notation ŷ i instead of y i.)

The objective function We want to estimate the parameters of te model: B = (b mi ), C = (c mk ), D = (d mi ) (m = 1,..., M; i = 1,..., n; k = 1,... L) in matrix notation (estimates of the parameters c m0, d 0i follow from these) such that the objective function w 0 M var (F m ˆF m ) L + m=1 n w i var (y i ŷ i ) (4) is minimum on the conditions for the orthogonality and variance of the factors: cov (F m, F l ) = 0, m l; var (F m ) = v m, m = 1,..., M (5) where w 0, w 1,..., w n are given non-negative constants (balancing between the dynamic and static part), while the positive numbers v m s indicate the relative importance of the individual factors. i=1

Notation In Bánkövi et al., authors use the same weights Denote v m = t 2 t 1 + 1, m = 1,..., M. ȳ i = 1 t 2 t 1 + 1 t 2 t=t 1 y i (t) the sample mean (average with respect to the time) of the ith component, cov (y i, y j ) = 1 t 2 t 1 + 1 t 2 t=t 1 (y i (t) ȳ i ) (y j (t) ȳ j ) the sample covariance between the ith and jth components, while cov (y i, y j ) = 1 t 2 t 1 t 2 t=t 1 (y i (t) ȳ i ) (y j (t) ȳ j ) the corrected empirical covariance between them.

The trivial parameters The parameters c m0, d 0i can be written in terms of the other parameters: c m0 = 1 t 2 t 1 L + 1 t 2 t=t 1 +L (F m (t) L c mk F m (t k)), k=1 m = 1,..., M and M d 0i = ȳ i d mi F m, m=1 i = 1,..., n.

Further notation Thus, the parameters to be estimated are collected in the M n matrices B, D, and in the M L matrix C. b m R n be the mth row of matrix B, m = 1,..., M. Y ij := cov (y i, y j ), i, j = 1,... n, and Y := (Y ij ) is the n n symmetric, positive semidefinite empirical covariance matrix of the sample (sometimes it is corrected).

Delayed time series: z m i L (t) = y i (t) c mk y i (t k), (6) k=1 and = t = t 1 + L,..., t 2 ; i = 1,..., n; m = 1,..., M 1 t 2 t 1 L + 1 Z m ij := cov (z m i, z m j ) = t 2 t=t 1 +L (z m i (t) z m i ) (z m j (t) z m j ), (7) i, j = 1,... n, where z i m 1 = t2 t 2 t 1 L+1 t=t1 +L zm i (t), i = 1,..., n; m = 1,..., M.

The objective function revisited Let Z m = (Zij m ) be the n n symmetric, positive semidefinite covariance matrix of these variables. The objective function of (4) to be minimized: 2 G(B, C, D) = w 0 n M m=1 M w i d mi i=1 m=1 j=1 b T mz m b m + n b mj Y ij + n n w i Y ii i=1 M w i i=1 m=1 where the minimum is taken on the constraints d 2 miv m, b T myb l = δ ml v m, m, l = 1,..., M. (8)

Outer cycle of the iteration Choosing an initial B satisfying (8), the following two steps are alternated: 1 Starting with B we calculate the F m s based on (1), then we fit a linear model to estimate the parameters of the autoregressive model (2). Hence, the current value of C is obtained. 2 Based on this C, we find matrices Z m using (6) and (7) (actually, to obtain Z m, the mth row of C is needed only), m = 1,..., M. Putting it into G(B, C, D), we take its minimum with respect to B and D, while keeping C fixed. With this B, we return to the 1st step of the outer cycle and proceed until convergence.

Fixing C, the part of the objective function to be minimized in B and D is M n M F (B, D) = w 0 b T mz m b m + dmiv 2 m 2 m=1 n M w i d mi i=1 m=1 j=1 Taking the derivative with respect to D: M n F (B, D opt ) = w 0 b T mz m b m m=1 w i i=1 m=1 n b mj Y ij, M w i i=1 m=1 1 v m ( Introducing V jk = n i=1 w iy ij Y ik, V = (V jk ), and S m = w 0 Z m 1 V, m = 1,..., M v m we have F (B, D opt ) = n b mj Y ij ) 2. j=1 M b T ms m b m. (9)

Thus, F (B, D opt ) is to be minimized on the constraints for b m s. Transforming the vectors b 1,..., b m into an orthonormal set, an algorithm to find extrema of inhomogeneous quadratic forms is to be used. The transformation x m := 1 vm Y 1/2 b m, A m := v m Y 1/2 S m Y 1/2, m = 1,..., M will result in an orthonormal set x 1,..., x M R n, further F (B, D opt ) = and by back transformation: M x T ma m x m, m=1 (10) b opt m = v m Y 1/2 x opt m, m = 1,..., M.

German Federal Republic, 1953 1982 COP: Consumer Prices INP: Industrial Production EMP: Employment WAG: Wages EXP: Export GOC: Goverment Consumption GFC: Gross Fixed Capital PRC: Private Consumption IMP: Imports GDP: Gross Domestic Product CPS: Claims on Private Sector DOC: Domestic Credit POP: Population, Population

The first dynamic factor

Further dynamic factors

Factors as linear combinations of variables

Variables as linear combinations of factors

Extrema of sums of inhomogeneous quadratic forms Given the n n symmetric matrices A 1,..., A k (k n) we are looking for an orthonormal set of vectors x 1,..., x k R n such that k x T i A i x i maximum. i=1

Theoretical solution By Lagrange s multipliers the x i s giving the optimum satisfy the system of linear equations A(X) = XS (11) with some k k symmetric matrix S, where the n k matrices X and A(X) are as follows: X = (x 1,..., x k ), A(X) = (A 1 x 1,..., A k x k ). Due to the constraints imposed on x 1,..., x k, the non-linear system of equations X T X = I k (12) must also hold.

As X and the symmetric matrix S contain alltogether nk + k(k + 1)/2 free parameters, while the equations (11) and (12) the same number of equations, the solution of the problem is expected. Transform (11) into a homogeneous system of linear equations, to get a non-trivial solution, A I n S = 0 (13) must hold, where the nk nk matrix A is a Kronecker-sum A = A 1 A k ( denotes the Kronecker-product). Generalization of the eigenvalue problem: eigenmatrix problem.

Numerical solution Starting with a matrix X (0) of orthonormal columns, the mth step of the iteration based on the (m 1)th one is as follows (m = 1, 2,... ): Teke the polar decomposition A(X (m 1) ) = X (m) S into an n k matrix of orthonormal columns and a k k symmetric matrix. Let the first factor be X (m), etc. until convergence. The polar decomposition is obtained by SVD. The above iteration is easily adopted to negative semidefinite or indefinite matrices and to finding minima instead of maxima.

COMPROMISE FACTOR ANALYSIS A method for compromise factor extraction from covariance/correlation matrices corresponding to different strata is introduced. Compromise factors are independent and on this constraint they explain the largest possible part of the variables total variance over the strata. The so-called compromise representation of the strata is introduced. A practical application for parallel factoring of medical data in different strata is also presented.

Application In biological applications data are frequently derived from different strata, but the observed variables are the same in each of them. We would like to assign scores to the variables different ones in different strata in such a way that together with other strata scores they accomplish the best possible compromise between the strata. In the case of normally distributed data the covariance matrices of the same variables are calculated in each stratum separately. In fact, the data need not be necessarily normally distributed, but it is supposed that the covariance structure somehow reflects the interconnection between the variables. One factor from each stratum is extracted. The purpose of the compromise factor analysis is similar to that of the discriminant analysis. Here, however, we find a linear combination of the variables for each stratum that obey the orthogonality conditions.

The model Let ξ 1,..., ξ k be n-dimensional, normally distributed random variables with positive definite covariance matrices C 1,..., C k (k n), respectively. Let us suppose that the mean vectors are zero (otherwise the estimated means are subtracted). ξ i = f + e i (i = 1,..., k), where f and e i (i = 1,..., k) are n-dimensional normally distributed random vector variables with zero mean vectors and covariance matrices D and B i (i = 1,..., k), respectively, and D is supposed to be an n n diagonal matrix. e i s are mutually independent of each-other and of f. The random vector variable f can be thought of as a main common factor of ξ i s while e i is characteristic to the ith stratum or measurement (i = 1,..., k).

Matrix notation Therefore, C i = D + B i and the cross-covariance matrix Eξ i ξj T = D is the same diagonal matrix with nonnegative diagonal entries for all i j. The observed random vectors ξ 1,..., ξ k may also be repeated measurements for n dependent Gaussian variables in the same population. This kind of linear model can be fitted with the usual techniques, and the maximum likelihood estimate for D is constructed on the basis of a sample taken in k not independent strata or in the case of k times repeated measurements. To test the diagonality of D a likelihood ratio test is used.

The optimum problem Provided the model fits, we are looking for stochastically independent linear combinations a T 1 ξ 1,..., a T k ξ k of the above vector variables such that k Var (ai T ξ i ) = i=1 k ai T C i a i i=1 maximum on the following constraints: the vectors a i s are standardized in such a way that ai T Da i = 1 (i = 1,..., k). The constraints together with the independence conditions imply that ai T Da j = δ ij (i, j = 1,..., k).

Numerical algorithm By means of the transformations b i := D 1/2 a i (i = 1,..., k), the optimization problem is equivalent to k bi T (D 1/2 C i D 1/2 )b i i=1 maximum where the maximization is through all orthonormal systems b 1,..., b k R n. Since the n n matrices D 1/2 C i D 1/2 are symmetric, the algorithm constructed for inhomogeneous quadratic forms is applicable. Let b1,..., b k denote the compromise system of the matrices D 1/2 C 1 D 1/2,..., D 1/2 C k D 1/2. Finally, by backward transformations ai = D 1/2 bi the linear combinations giving the extremum are obtained.

A medical application We applied the method for clinical measurements (protein, triglicerin and other organic matter concentration in the urine) of nephrotic patients. We distinguished between three stages of the illness : a no symptoms stage and two nephrotic stages, one of them is an intermediate stage, and in the other the illness has already seriously developed. First, we tried to perform discriminant analysis for the three above groups, but the difference between them was not really remarkable. We obtained a poor classification, and the canonical variables best discriminating the groups providing the largest ANOVA F-statistics did not show significant difference between the groups. Instead, our program provides a profile of the variables in each group and remarkable differences in the factor loadings can be observed even in cases when the difference of covariance/correlation matrices is not so evident.

Compromise factor loadings for three nephrotic stages The total sample consisted of 100 patients. The results for the three stages: NO SYMPTOMS INTERMEDIATE NEPHROTIC

AT 0.104339 0.151711 0.068392 PC 0.151864 +0.060398 +0.062981 KO2 0.355027 0.662945 0.423931 TG 0.134190 0.372486 +0.781611 HK 0.241672 +0.194526 +0.421601 LK +0.496214 0.543357 +0.149016 PROT +0.522984 +0.194241 0.027665 URIN 0.493607 +0.155758 +0.001543 NAK 0.014336 +0.005123 +0.001286

Conclusions In the characterization of the no symptoms stage the variables PROT, LK and URIN play the most important role (former ones positively, while the latter one negatively characterizes the healthy patients). In the seriously nephrotic stage TG and HK positively, while KO2 negatively characterizes the patients. In the intermediate stage KO2 s effect is also negative (even more than in the case of seriously ill stage), while LK s effect is opposite to that of the no symptoms stage. Thus, one may conclude that mainly measurements with high loadings in absolute value have to be considered seriously in the diagnosis.