Variable selection in principal components analysis of qualitative data using the accelerated ALS algorithm

Similar documents
Principal components based on a subset of qualitative variables and its accelerated computational algorithm

Two-stage acceleration for non-linear PCA

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

The Method of Least Squares. To understand least squares fitting of data.

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.

Machine Learning for Data Science (CS 4786)

Estimation of Backward Perturbation Bounds For Linear Least Squares Problem

IN many scientific and engineering applications, one often

Notes on iteration and Newton s method. Iteration

Fastest mixing Markov chain on a path

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Introduction to Optimization Techniques

Math 61CM - Solutions to homework 3

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Research Article A New Second-Order Iteration Method for Solving Nonlinear Equations

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Chandrasekhar Type Algorithms. for the Riccati Equation of Lainiotis Filter

x x x 2x x N ( ) p NUMERICAL METHODS UNIT-I-SOLUTION OF EQUATIONS AND EIGENVALUE PROBLEMS By Newton-Raphson formula

A proposed discrete distribution for the statistical modeling of

Testing for Convergence

Stochastic Simulation

A statistical method to determine sample size to estimate characteristic value of soil parameters

Lainiotis filter implementation. via Chandrasekhar type algorithm

Complex Analysis Spring 2001 Homework I Solution

Benaissa Bernoussi Université Abdelmalek Essaadi, ENSAT de Tanger, B.P. 416, Tanger, Morocco

Optimization Methods MIT 2.098/6.255/ Final exam

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation

Lecture 24: Variable selection in linear models

Chapter 2 The Solution of Numerical Algebraic and Transcendental Equations

A Genetic Algorithm for Solving General System of Equations

NEW FAST CONVERGENT SEQUENCES OF EULER-MASCHERONI TYPE

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem

Section A assesses the Units Numerical Analysis 1 and 2 Section B assesses the Unit Mathematics for Applied Mathematics

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

CS537. Numerical Analysis and Computing

Integer Linear Programming

Arkansas Tech University MATH 2924: Calculus II Dr. Marcel B. Finan

A class of spectral bounds for Max k-cut

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Support vector machine revisited

6.3 Testing Series With Positive Terms

Similarity Solutions to Unsteady Pseudoplastic. Flow Near a Moving Wall

Section 11.8: Power Series

Approximating the ruin probability of finite-time surplus process with Adaptive Moving Total Exponential Least Square

c 2006 Society for Industrial and Applied Mathematics

Algebra of Least Squares

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

An alternating series is a series where the signs alternate. Generally (but not always) there is a factor of the form ( 1) n + 1

CS321. Numerical Analysis and Computing

Expectation-Maximization Algorithm.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

IT is well known that Brouwer s fixed point theorem can

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3.

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SCORE. Exam 2. MA 114 Exam 2 Fall 2016

SCORE. Exam 2. MA 114 Exam 2 Fall 2017

Machine Learning for Data Science (CS 4786)

Markov Decision Processes

Generating Functions for Laguerre Type Polynomials. Group Theoretic method

RAINFALL PREDICTION BY WAVELET DECOMPOSITION

Math 113 Exam 3 Practice

Topics in Eigen-analysis

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7:

Eigenvalues and Eigenvectors

You may work in pairs or purely individually for this assignment.

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

A NEW CLASS OF 2-STEP RATIONAL MULTISTEP METHODS

Formulas for the Number of Spanning Trees in a Maximal Planar Map

MAT1026 Calculus II Basic Convergence Tests for Series

Decoupling Zeros of Positive Discrete-Time Linear Systems*

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Root Finding COS 323

SCORE. Exam 2. MA 114 Exam 2 Fall 2016

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

INFINITE SEQUENCES AND SERIES

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

A Challenging Test For Convergence Accelerators: Summation Of A Series With A Special Sign Pattern

Section 1.4. Power Series

, then cv V. Differential Equations Elements of Lineaer Algebra Name: Consider the differential equation. and y2 cos( kx)

Numerical Solution of the Two Point Boundary Value Problems By Using Wavelet Bases of Hermite Cubic Spline Wavelets

μ are complex parameters. Other

Numerical Solution of Non-linear Equations

A NEW APPROACH TO SOLVE AN UNBALANCED ASSIGNMENT PROBLEM

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Overview. Structured learning for feature selection and prediction. Motivation for feature selection. Outline. Part III:

Using An Accelerating Method With The Trapezoidal And Mid-Point Rules To Evaluate The Double Integrals With Continuous Integrands Numerically

ACCELERATING CONVERGENCE OF SERIES

Linear Programming and the Simplex Method

State Space Representation

Chapter 7: Numerical Series

A multivariate rational interpolation with no poles in R m

A NOTE ON THE TOTAL LEAST SQUARES FIT TO COPLANAR POINTS

Transcription:

Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm Masahiro Kuroda Yuichi Mori Masaya Iizuka Michio Sakakihara (Okayama Uiversity of Sciece) (Okayama Uiversity of Sciece) (Okayama Uiversity) (Okayama Uiversity of Sciece) * supported by the Japa Society for the Promotio of Sciece (JSPS), Grat-i-Aid for Scietific Research (C), No 20500263. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 1/29

Motivatio Variable sectio i PCA of qualitative data The ALS algorithm for quatifyig qualitative data PCA.ALS PRINCIPALS: Youg, Takae & de Leeuw (1978) (SAS) PRINCALS: Gifi (1990) (SPSS) Acceleratio of PCA.ALS usig the vector ε (vε) algorithm vε-pca.als: Kuroda, Mori, Iizuka & Sakakihara (2011) i CSDA. Modified PCA (M.PCA) Formulatio of M.PCA: Taaka & Mori (1997) Backward elimiatio & Forward selectio procedures: Mori et al. (1998, 2006) Applicatio of vε-pca.als to variable selectio i M.PCA of qualitative data Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 2/29

PCA with variables measured by mixed scaled levels X : p matrix ( observatios o p variables; columwise stadardized) I PCA, X is postulated to be approximated by a biliear structure of the form: where ˆX = ZA, Z is a r matrix of compoet scores o r compoets (1 r p), A is a p r matrix cosistig of the eigevectors of X X/ ad A A = I r. We fid Z ad A such that θ = tr(x ˆX) (X ˆX) = tr(x ZA ) (X ZA ) is miimized for the prescribed umber of compoets r. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 3/29

PCA with variables measured by mixed scaled levels For oly quatitative variables (iterval ad ratio scales) We ca fid Z ad A (or ˆX = ZA ) miimizig θ = tr(x ˆX) (X ˆX) = tr(x ZA ) (X ZA ). For mixed scaled variables (omial, ordial, iterval ad ratio scales) Optimal scalig is ecessary to quatify the observed qualitative data, i.e., we eed to fid a optimally scaled observatio X miimizig θ = tr(x ˆX) (X ˆX) = tr(x ZA ) (X ZA ), where [ X X X ] 1 = 0 p ad diag = I p, i additio to Z ad A, simultaeously. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 4/29

Alteratig least squares algorithm to fid the optimal scaled observatio X To fid model parameters (Z, A) ad optimal scalig parameter X, the Alteratig Least Squares (ALS) algorithm ca be utilized: PCA.ALS (PRINCIPALS, PRINCALS) Model parameter estimatio step : estimate Z ad A coditioally o fixed X. Optimal scalig step : fid X for miimizig θ coditioally o fixed Z ad A. X p = Model parameter estimatio = Optimal scalig Z r p A r Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 5/29

Alteratig least squares algorithm to fid the optimal scaled observatio X [ PCA.ALS algorithm ] PRINCIPALS (Youg et al, 1978) Superscript (t) idicates the t-th iteratio. Model parameter estimatio step: Obtai A (t) by solvig [ X (t) X (t) ] A = AD r, where A A = I r ad D r is a r r diagoal eigevalue matrix. Compute Z (t) from Z (t) = X (t) A (t). Optimal scalig step: Calculate ˆX (t+1) = Z (t) A (t). Fid X (t+1) such that X (t+1) = arg mi X tr(x ˆX (t+1) ) (X ˆX (t+1) ) for fixed ˆX (t+1) uder measuremet restrictios o each variables. Scale X (t+1) by columwise ormalizig ad ceterig. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 7/29

Acceleratio of PCA.ALS by the vector ε accelerator To accelerate the computatio, we ca use vector ε accelerator (vε accelerator) by Wy (1962), which speeds up the covergece of a slowly coverget vector sequece, is very effective for liearly covergig sequeces, geerates a sequece {Ẏ(t) } t 0 from the iterative sequece {Y (t) } t 0. Covergece: The accelerated sequece {Ẏ(t) } t 0 coverges to the statioary poit Y of {Y (t) } t 0 faster tha {Y (t) } t 0. Computatioal cost: At each iteratio, the vε algorithm requires oly O(d) arithmetic operatios while the Newto-Raphso ad quasi-newto algorithms are achieved at O(d 3 ) ad O(d 2 ) where d is the dimesio of Y. Covergece speed: The best speed of covergece is superliear. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 8/29

Acceleratio of PCA.ALS by the vector ε accelerator The vε accelerator is give by Ẏ (t 1) = Y (t) + [ [ Y (t 1) Y (t)] 1 + [Y (t+1) Y (t)] ] 1 1, where [Y] 1 = Y/ Y 2 ad Y is the Euclidea orm of Y. The accelerated vector Ẏ (t 1) is obtaied by the origial sequece (Y (t 1), Y (t), Y (t+1) ) The vε accelerator does ot deped o the statistical model {Y (t) } t 0. Therefore, whe the vε algorithm is applied to ALS, it guaratees the covergece properties of the ALS. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 9/29

Acceleratio of PCA.ALS by the vector ε accelerator To accelerate PCA.ALS, we itroduce the vε algorithm ito PCA.ALS, i.e., From a sequece {X (t) } t 0 = {X (0), X (1),, X ( ) } i PCA.ALS, make a accelerated sequece {Ẋ (t) } t 0 = {Ẋ (0), Ẋ (1),, X ( ) }. [ Geeral procedure of vε-pca.als ] Alterate the followig two steps util the algorithm is coverged: PCA.ALS step: Compute model parameters A (t) ad Z (t) ad determie optimal scalig parameter X (t+1). Acceleratio step: Calculate Ẋ (t 1) usig {X (t 1), X (t), X (t+1) } from the vε algorithm: [ [ ] 1 [ 1 ] 1 vecẋ (t 1) = vecx (t) + vec(x (t 1) X (t) ) + vec(x (t+1) X )] (t), where vecx stads for the vectors of colums of X, ad check the covergece by vec(ẋ (t 1) Ẋ (t 2) ) 2 < δ, where δ is a desired accuracy. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 10/29

Acceleratio of PCA.ALS by the vector ε accelerator X p = Model parameter estimatio = Optimal scalig Z r p A r p PCA.ALS step: {X (0),..., X (t) } Ẋ Acceleratio step: {Ẋ (0),..., Ẋ (s) } Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 11/29

Acceleratio of PCA.ALS by the vector ε accelerator Sice vε-pca.als is desiged to geerate {Ẋ (t) } t 0 covergig to X ( ), the estimate of X ca be obtaied from the fial value of {Ẋ (t) } t 0 whe vε-pca.als termiates, the estimates of Z ad A ca the be calculated immediately from the estimate of X i the Model parameter estimatio step of PCA.ALS. Note that Ẋ (t 1) obtaied at the t-th iteratio of the Acceleratio step is ot used as the estimate X (t+1) at the (t + 1)-th iteratio of the PCA.ALS step. Thus vε-pca.als speeds up the covergece of {X (t) } t 0 without affectig the covergece properties of PCA.ALS procedure. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 12/29

Modified PCA (M.PCA) of Taaka & Mori (1997) Modified PCA (M.PCA) derives pricipal compoets that are computed as liear combiatios of a subset of variables but ca reproduce all the variables very well. Let X be decomposed ito a q submatrix X V1 ad a (p q) remaiig submatrix X V2. The M.PCA fids r liear combiatios Z = X V1 A. X p = Variable selectio q X V1 p X V2 = M.PCA Z r Approximate: Z PCA Z Z PCA = XA = A from [X X/]A = AD r Z = X V1 A = A from [(S 2 11 + S 12 S 21 ) DS 11 ]A = 0 Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 13/29

Formulatio of Modified PCA (M.PCA) The matrix A cosists of the eigevectors associated with the largest r eigevalues λ 1 λ 2 λ r ad is obtaied by solvig the eigevalue problem: [(S 2 11 + S 12 S 21 ) DS 11 ]A = 0, ( ) S11 S where S = 12 S 21 S is the covariace matrix of X = (X V1, X V2 ) ad D is 22 a q q diagoal matrix of eigevalues. A best subset of q variables has the largest value of the proportio P = r j=1 λ j/tr(s) = we use P as variable selectio criteria. the RV -coefficiet RV = { r j=1 λ2 j /tr(s2 )} 1/2. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 14/29

Variable selectio procedures i M.PCA I order to fid a subset of q variables, we employ two variable selectio procedures of Mori et al. (1998, 2006) Backward elimiatio procedure ad Forward selectio procedure. These procedures are cost-savig stepwise selectio procedures ad are remove or add oly oe variable sequetially. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 15/29

Backward elimiatio [Backward elimiatio] Stage A: Iitial fixed-variables stage A-1 Assig q variables to subset X V1, usually q := p. A-2 Solve the eigevalue problem. A-3 Look carefully at the eigevalues, determie the umber r of pricipal compoets to be used. A-4 Specify kerel variables which should be ivolved i X V1, if ecessary. The umber of kerel variables is less tha q. Stage B: Variable selectio stage (Backward) B-1 Remove oe variable from amog q variables i X V1, make a temporary subset of size q 1, ad compute P based o the subset. Repeat this for each variable i X V1, the obtai q values o P. Fid the best subset of size q 1 which provides the largest P amog these q values ad remove the correspodig variable from the preset X V1. Put q := q 1. B-2 If P or q is larger tha preassiged values, go to B-1. Otherwise stop. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 16/29

Forward selectio [Forward selectio] Stage A: Iitial fixed-variables stage A-1 A-3 Same as A-1 to A-3 i Backward elimiatio. A-4 Redefie q as the umber of kerel variables (here, q r). If you have kerel variables, assig them to X V1. If ot, put q := r, fid the best subset of q variables which provides the largest P amog all possible subsets of size q ad assig it to X V1. Stage B: Variable selectio stage (Forward) B-1 Addig oe of the p q variables i X V2 to X V1, make a temporary subset of size q + 1 ad obtai P. Repeat this for each variable i X V2, the obtai p q P s. Fid the best subset of size q + 1 which provides the largest (or smallest) P amog the p q P s ad add the correspodig variable to the preset subset of X V1. Put q := q + 1. B-2 If the P or q are smaller (or larger) tha preassiged values, go back to B-1. Otherwise stop. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 17/29

Variable selectio i M.PCA Variable selectio is to fid a subset of q variables that best approximates all the variables. Fid X V1 usig Backward elimiatio or Forward selectio procedures. X p = Variable selectio q X V1 p X V2 = M.PCA Z r Approximate: Z PCA Z Z PCA = XA = A from [X X/]A = AD r Z = X V1 A = A from [(S 2 11 + S 12 S 21 ) DS 11 ]A = 0 Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 18/29

Variable selectio i M.PCA of qualitative data M.PCA of qualitative data: Iteratio betwee Variable selectio ad PCA.ALS X p = Variable selectio q p X V 1 X V 2 M.PCA usig PCA.ALS Z r Steps of variable selectio ad PCA.ALS Variable selectio: Select X V1 usig Backward elimiatio or Forward selectio. PCA.ALS: Quatify X V 1 ad compute A ad Z usig the ALS algorithm. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 19/29

vε-pca.als for variable selectio i M.PCA of qualitative data X p = Variable selectio q p X V 1 X V 2 M.PCA usig vε-pca.als Acceleratio Z r Steps of variable selectio ad PCA.ALS Variable selectio: Select X V1 usig Backward elimiatio or Forward selectio. vε-pca.als: PCA.ALS: Quatify X V 1 ad compute A ad Z usig the ALS algorithm vε acceleratio: Geerate a accelerated sequece of X V 1 usig the vε algorithm Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 21/29

Numerical experimets: Variable selectio i M.PCA of qualitative data Computatio Algorithm: PRINCIPALS ad vε-principals Covergece criteria: δ = 10 8 The umber of pricipal compoets (PCs): r = 3 Compare The umber of iteratios ad CPU time Iteratio ad CPU time speed-ups Iteratio (CPU time) speed-ups = The umber of iteratios (CPU time) of PRINCIPALS The umber of iteratios (CPU time) of vε-principals [Simulatio 1]: Computatio of PCs The size of sample () 100 ** The umber of items (p) 40 items with 20 levels (Data 1) ** The umber of items (p) 20 items with 10 levels (Data 2) ++ Replicatio: 50 times [Simulatio 2]: Variable selectio i M.PCA The size of sample (): 100 The umber of items (p): 10 items with 5 levels (Data 3) Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 22/29

Numerical experimets: Data 1 & 2 Table 1: Summary of statistics ad speed-ups (a) Data 1 (100 40 with 20 levels) PRINCIPALS vε-principals Speed-ups Iteratio CPU time Iteratio CPU time Iteratio CPU time Mi. 208.0 10.38 75.0 4.140 1.340 1.310 1st Qu. 435.8 21.27 160.0 8.268 2.200 2.138 Media 592.5 28.72 215.0 10.930 2.675 2.530 Mea 716.2 34.64 277.8 13.991 2.762 2.623 3rd Qu. 788.2 38.20 318.2 15.900 3.180 2.960 Max. 2682.0 128.54 1107.0 54.260 5.430 5.170 (b) Data 2 (100 20 with 10 levels) PRINCIPALS vε-principals Speed-ups Iteratio CPU time Iteratio CPU time Iteratio CPU time Mi 136.0 2.640 46.0 1.070 1.760 1.690 1st Qu. 236.5 4.435 85.0 1.808 2.487 2.272 Media 345.5 6.370 137.0 2.715 3.280 2.760 Mea 437.0 8.021 135.0 2.702 3.232 2.917 3rd Qu. 573.2 10.390 171.2 3.397 3.740 3.410 Max. 1564.0 28.050 348.0 6.560 5.710 5.240 Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 23/29

Numerical experimets: Boxplots of iteratio ad CPU time speed-ups Data 1 (100 40 with 20 levels) Data 2 (100 20 with 10 levels) 2 3 4 5 2 3 4 5 Iteratio CPU time Iteratio CPU time Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 24/29

Numerical experimets: Data 3 Table 2(a): The umbers of iteratios ad CPU times of PRINCIPALS ad vε-principals ad their speed-ups i applicatio to variable selectio for fidig a subset of q variables usig simulated data. (a) Backward elimiatio PRINCIPALS vε-principals Speed-up q Comb. Iteratio CPU time Iteratio CPU time Iteratio CPU time 10 1 141 1.70 48 0.68 2.94 2.49 9 10 1363 17.40 438 6.64 3.11 2.62 8 9 1620 20.19 400 5.98 4.05 3.37 7 8 1348 16.81 309 4.80 4.36 3.50 6 7 4542 53.72 869 11.26 5.23 4.77 5 6 13735 159.72 2949 35.70 4.66 4.47 4 5 41759 482.59 12521 148.13 3.34 3.26 3 4 124 1.98 44 1.06 2.82 1.86 Total 50 64491 752.40 17530 213.57 3.68 3.52 Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 25/29

Numerical experimets: Data 3 Table 2(b): The umbers of iteratios ad CPU times of PRINCIPALS ad vε-principals ad their speed-ups i applicatio to variable selectio for fidig a subset of q variables usig simulated data. (b) Forward selectio PRINCIPALS vε-principals Speed-up q Comb. Iteratio CPU time Iteratio CPU time Iteratio CPU time 3 120 4382 67.11 1442 33.54 3.04 2.00 4 7 154743 1786.70 26091 308.33 5.93 5.79 5 6 13123 152.72 3198 38.61 4.10 3.96 6 5 3989 47.02 1143 14.24 3.49 3.30 7 4 1264 15.27 300 4.14 4.21 3.69 8 3 340 4.38 108 1.70 3.15 2.58 9 2 267 3.42 75 1.17 3.56 2.93 10 1 141 1.73 48 0.68 2.94 2.54 Total 148 178249 2078.33 32405 402.40 5.50 5.16 Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 26/29

Numerical experimets: Data 3 The umber of iteratios CPU time Backward elimiatio Forward selectio Backward elimiatio Forward selectio The umber of iteratios 0 10000 30000 50000 The umber of iteratios 0 50000 100000 150000 CPU time 0 200 400 600 CPU time 0 500 1000 1500 2000 10 8 6 4 3 5 7 9 10 8 6 4 3 5 7 9 q q q q Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 27/29

Coclusio [Numerical experimets] The vε acceleratio works well to reduce the computatioal time of PRINCIPALS. [Future works] Meas of iteratio & CPU time speed-ups: Computatio of PCs Iteratio CPU time Data 1 (100 40 with 20 levels) 2.762 (36%) 2.623 (38%) Data 2 (100 20 with 10 levels) 3.232 (31%) 2.917 (34%) Iteratio & CPU time speed-ups: Variable selectio Data 3 Iteratio CPU time Backward elimiatio 3.68 (27%) 3.52 (28%) Forward selectio 5.50 (18%) 5.16 (19%) We apply vε-principals with a re-start procedure to variable selectio i PCA of qualitative data. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 28/29

Refereces Refereces GIFI, A. (1989): Algorithm descriptios for ANACOR, HOMALS, PRINCALS, ad OVERALS. Report RR 89-01. Leide: Departmet of Data Theory, Uiversity of Leide. KURODA, M. ad SAKAKIHARA, M. (2006): Acceleratig the covergece of the EM algorithm usig the vector ε algorithm. Computatioal Statistics ad Data Aalysis 51, 1549-1561. Kuroda, M., Mori, Y., Iizuka, M. ad Sakakihara, M. (2011). Acceleratio of the alteratig least squares algorithm for pricipal compoets aalysis. Computatioal Statistics ad Data Aalysis, 55, 143-153. MICHAILIDIS, G. ad DE LEEUW, J. (1998): The Gifi system of descriptive multivariate aalysis. Statistical Sciece 13, 307-336. MORI, Y., TANAKA, Y. ad TARUMI, T. (1997): Pricipal compoet aalysis based o a subset of variables for qualitative data. I: C. Hayashi, K. Yajima, H. Bock, N. Ohsumi, Y. Taaka, Y. Baba (Eds.): Data Sciece, Classificatio, ad Related Methods (Proceedigs of IFCS-96). Spriger-Verlag, 547-554. YOUNG, F.W., TAKANE, Y., ad DE LEEUW, J. (1978): Pricipal compoets of mixed measuremet level multivariate data: A alteratig least squares method with optimal scalig features. Psychometrika 43, 279-281. WANG, M., KURODA, M., SAKAKIHARA, M. ad GENG, Z. (2008): Acceleratio of the EM algorithm usig the vector epsilo algorithm. Computatioal Statistics 23, 469-486. WYNN, P. (1962): Acceleratio techiques for iterated vector ad matrix problems. Mathematics of Computatio 16, 301-322. Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm 29/29