A New Robust Partial Least Squares Regression Method

Size: px

Start display at page:

Download "A New Robust Partial Least Squares Regression Method"

Dayna Park
5 years ago
Views:

1 A New Robust Partial Least Squares Regression Method 8th of September, 2005 Universidad Carlos III de Madrid Departamento de Estadistica

2 Motivation Robust PLS Methods PLS Algorithm Computing Robust Variance Covariance Matrix Monte Carlo Simulation A Service Quality Application Conclusions Future work

3 Why robust methods? Motivation Robust PLS Methods... just which robust methods you use is not important, what is important is that you use some. J. W. Tukey (1979) Fundamental Continuity Concept Small changes in the data result in only small changes in estimate. Change a few, so what? J.W. Tukey (1977).

4 Outliers Outline Motivation Robust PLS Methods Outliers are atypical observations that are well separated from the bulk of the data. Covariance matrix and means vector are very sensitive to outliers in data. 1-D (relatively easy to detect). 2-D (harder to detect). Higher-D (very hard to detect).

5 Literature review Outline Motivation Robust PLS Methods Wakeling and Macfie (1992). First PLSR robustification. Algorithm with weighted regressions for PLS1 and PLS2.

6 Literature review Outline Motivation Robust PLS Methods Wakeling and Macfie (1992). First PLSR robustification. Algorithm with weighted regressions for PLS1 and PLS2. Griep et al. (1995). Comparative study of LSM, RM, and IRLS. (Algorithms not resistant to leverage points).

7 Literature review Outline Motivation Robust PLS Methods Wakeling and Macfie (1992). First PLSR robustification. Algorithm with weighted regressions for PLS1 and PLS2. Griep et al. (1995). Comparative study of LSM, RM, and IRLS. (Algorithms not resistant to leverage points). Gil and Romera (1998). Robustification of the cross variance matrix through the SD estimator.

8 Literature review Outline Motivation Robust PLS Methods Wakeling and Macfie (1992). First PLSR robustification. Algorithm with weighted regressions for PLS1 and PLS2. Griep et al. (1995). Comparative study of LSM, RM, and IRLS. (Algorithms not resistant to leverage points). Gil and Romera (1998). Robustification of the cross variance matrix through the SD estimator. Hubert and Vandem Brandem (2003). PLS Robustification based on the SIMPLS algorithm.

9 Data and General Considerations PLS Algorithm Computing Robust Variance Covariance Matrix We analyze the case q=1.

10 Data and General Considerations PLS Algorithm Computing Robust Variance Covariance Matrix We analyze the case q=1. Population loading vector computed as in Helland (1988).

11 Data and General Considerations PLS Algorithm Computing Robust Variance Covariance Matrix We analyze the case q=1. Population loading vector computed as in Helland (1988). Optimal number of PLS component through leave-one-out cross validation.

12 Data and General Considerations PLS Algorithm Computing Robust Variance Covariance Matrix We analyze the case q=1. Population loading vector computed as in Helland (1988). Optimal number of PLS component through leave-one-out cross validation. One-step robustification comes from using the robust covariance matrix.

13 PLS Algorithm Outline PLS Algorithm Computing Robust Variance Covariance Matrix [y,x] = ( σ 2 y δ T y,x δ y,x x ) The population loading vectors given by Helland (1988): w 1 δ y,x w a+1 δ y,x x W a(w T a x W a) 1 W T a δ y,x Where W a, 1 < a A are the loading vectors W i and A the selected numbers of PLS components. W a = [w 1, w 2,..., w a ]

14 PLS Algorithm Outline PLS Algorithm Computing Robust Variance Covariance Matrix Then the population regression vector (non robust) is given by: β a = W a (Wa T x W a) 1 Wa T δ y,x

15 PLS Algorithm Outline PLS Algorithm Computing Robust Variance Covariance Matrix Then the population regression vector (non robust) is given by: β a = W a (Wa T x W a) 1 Wa T δ y,x The proposed global Robust algorithm come from using the covariance matrix S [y,x] obtained from the original data being in this case w 1 a normalization of δ y,x and: w a+1 δ y,x S x W a (W T a S x W a ) 1 W T a δ y,x

16 PLS Algorithm Computing Robust Variance Covariance Matrix Outliers detection and computation of S The algorithm Peña and Prieto(2005) works in three steps after data are scaled and centered: x i = S 1/2 (x i x), i = 1,..., n. STEP 1: finding directions maximizing and minimizing the kurtosis, projecting the data and identifying outliers.

17 PLS Algorithm Computing Robust Variance Covariance Matrix Outliers detection and computation of S The algorithm Peña and Prieto(2005) works in three steps after data are scaled and centered: x i = S 1/2 (x i x), i = 1,..., n. STEP 1: finding directions maximizing and minimizing the kurtosis, projecting the data and identifying outliers. STEP 2: generating random directions and stratifying the sample and identifying outliers again.

18 PLS Algorithm Computing Robust Variance Covariance Matrix Outliers detection and computation of S The algorithm Peña and Prieto(2005) works in three steps after data are scaled and centered: x i = S 1/2 (x i x), i = 1,..., n. STEP 1: finding directions maximizing and minimizing the kurtosis, projecting the data and identifying outliers. STEP 2: generating random directions and stratifying the sample and identifying outliers again. STEP 3: check the suspicious observations by using the Mahalanobis distance and repit until no more outliers are found.

19 Step I: Searching kurtosis directions PLS Algorithm Computing Robust Variance Covariance Matrix The direction that maximizes (minimizes) the coefficient of kurtosis is obtained as the solution of the optimization problem: d j = arg max(min) d 1 n n i=1 ( ) (j) 4 d x i s.t. d d = 1. The sample points are projected onto a lower dimension subspace, orthogonal to the directions d j

20 Step I: Searching kurtosis directions PLS Algorithm Computing Robust Variance Covariance Matrix Why kurtosis directions? Because is possible to study the presence of outliers on the kurtosis values and to use this moment coefficient to identify them. Symmetric and and a small proportion of outliers generated with asymmetric contamination increase the coefficient on the observed data.

21 Step I: Searching kurtosis directions PLS Algorithm Computing Robust Variance Covariance Matrix Why kurtosis directions? Because is possible to study the presence of outliers on the kurtosis values and to use this moment coefficient to identify them. Symmetric and and a small proportion of outliers generated with asymmetric contamination increase the coefficient on the observed data. A large proportion of outliers generated by asymmetric contamination can make the kurtosis coefficient very small.

22 Step II: Searching random directions PLS Algorithm Computing Robust Variance Covariance Matrix Chosen two observations randomly from the sample and compute the direction ˆd l defined by these two observations.

23 Step II: Searching random directions PLS Algorithm Computing Robust Variance Covariance Matrix Chosen two observations randomly from the sample and compute the direction ˆd l defined by these two observations. The observations are then projected onto this direction, to obtain the values ẑi l = ˆd l T x i

24 Step II: Searching random directions PLS Algorithm Computing Robust Variance Covariance Matrix Chosen two observations randomly from the sample and compute the direction ˆd l defined by these two observations. The observations are then projected onto this direction, to obtain the values ẑi l = ˆd l T x i Then the sample is partitioned into K groups of size n/k, where K is a prespecified number, based on the ordered values of the projections ẑi l, so that group k, 1 k K, contains those observations i satisfying. ẑ( (k 1)n/K +1) l ẑl i ẑ( kn/k ) l

25 Step II: Searching random directions PLS Algorithm Computing Robust Variance Covariance Matrix From each group k, 1 k K, a subsample of p observations is chosen without replacement, the orthogonal direction is computed and the corresponding projections. Why random directions? Because is necessary a procedure that detect outliers when the proportion of contamination is between.2 and.3 and the contamination distribution has the same variance as the original distribution. (case when kurtosis fails)

26 PLS Algorithm Computing Robust Variance Covariance Matrix Step III: Deleting outliers and computing S A Mahalanobis distance is computed for all observations labeled as outliers in the preceding steps. Being U the set of all observations not labeled as outliers: m = 1 U i U x i S = 1 U 1 i U (x i m)(x i m) v i = (x i m) T S 1 (x i m) Those observations i U such that v i < ξp 1, not to be outliers and are included in U. are considered

27 Simulation Study Outline Monte Carlo Simulation A Service Quality Application Comparative of methods PLS(Helland, 1988), PLS-SD (Gil and Romera, 1998),RSIMPLS (Hubert and Branden 2003) and (González, Peña and Romera).

28 Simulation Study Outline Monte Carlo Simulation A Service Quality Application Comparative of methods PLS(Helland, 1988), PLS-SD (Gil and Romera, 1998),RSIMPLS (Hubert and Branden 2003) and (González, Peña and Romera). Four types of outliers were generated: bad leverage points, vertical outliers, orthogonal outliers and very concentrated outliers.

29 Simulation Study Outline Monte Carlo Simulation A Service Quality Application Comparative of methods PLS(Helland, 1988), PLS-SD (Gil and Romera, 1998),RSIMPLS (Hubert and Branden 2003) and (González, Peña and Romera). Four types of outliers were generated: bad leverage points, vertical outliers, orthogonal outliers and very concentrated outliers. Three measures for comparing the methods.

30 Simulation Study Outline Monte Carlo Simulation A Service Quality Application Comparative of methods PLS(Helland, 1988), PLS-SD (Gil and Romera, 1998),RSIMPLS (Hubert and Branden 2003) and (González, Peña and Romera). Four types of outliers were generated: bad leverage points, vertical outliers, orthogonal outliers and very concentrated outliers. Three measures for comparing the methods. Simulations have been done in a personal computer Pentium III 650 MH with 128 Mb of internal memory. Code implemented in Matlab.

31 Simulation Study. Bilinear model Monte Carlo Simulation A Service Quality Application T N A (0 A, Σ t ) with A < p X = TI A,p + N p (0 p, 0.1I p ) Y = TQ + N q (0 q, I q ) (I A,p ) i,j = 1 for i = j and (I A,p ) i,j 1. Q is a matrix of dimensions A p with (A i,j ) = 1 i, j The simulation is done with a known values of A = A opt and generating randomly a number n ɛ of outliers. Table: Simulation study q n p A σ t σ t Contamination diag(4,2) 1 10% and 30% Out.

32 Simulation Study. Outliers Monte Carlo Simulation A Service Quality Application Bad leverage regression points: T ɛ N A (10 A, σ t ) X ɛ = T ɛ I A,p + N p (0 p, 0.1I p )

33 Simulation Study. Outliers Monte Carlo Simulation A Service Quality Application Bad leverage regression points: T ɛ N A (10 A, σ t ) X ɛ = T ɛ I A,p + N p (0 p, 0.1I p ) Vertical outliers: Y ɛ = TQ A,q + N q (10 q, 0.1I q )

34 Simulation Study. Outliers Monte Carlo Simulation A Service Quality Application Bad leverage regression points: T ɛ N A (10 A, σ t ) X ɛ = T ɛ I A,p + N p (0 p, 0.1I p ) Vertical outliers: Y ɛ = TQ A,q + N q (10 q, 0.1I q ) Orthogonal outliers: X ɛ = T ɛ I A,p + N p ((0 A, 10 p A ), 0.1I p )

35 Simulation Study. Outliers Monte Carlo Simulation A Service Quality Application Bad leverage regression points: T ɛ N A (10 A, σ t ) X ɛ = T ɛ I A,p + N p (0 p, 0.1I p ) Vertical outliers: Y ɛ = TQ A,q + N q (10 q, 0.1I q ) Orthogonal outliers: X ɛ = T ɛ I A,p + N p ((0 A, 10 p A ), 0.1I p ) Very concentrated outliers: X ɛ = T ɛ I A,p + N p (10 p, 0.001I p )

36 Monte Carlo Simulation A Service Quality Application MC Simulation Study. Measures The experimental slope of each method β (l) angβ 1,A = ang(β, ˆβ [y c,x c ],A)

37 MC Simulation Study. Measures Monte Carlo Simulation A Service Quality Application The experimental slope of each method β (l) angβ 1,A = ang(β, ˆβ [y c,x c ],A) The mean squared error of the norms. MSE A ( ˆβ) = 1 m m l=i ˆβ (l) A β

38 MC Simulation Study. Measures Monte Carlo Simulation A Service Quality Application The experimental slope of each method β (l) angβ 1,A = ang(β, ˆβ [y c,x c ],A) The mean squared error of the norms. MSE A ( ˆβ) = 1 m m l=i ˆβ (l) A β A test set of n t observations with the original model and we compute: RMSE A = 1 n n t i=1 (y i ŷ i,a ) 2 Being ŷ i,k the predicted value of y in the observation i.

39 Monte Carlo Simulation A Service Quality Application MC Simulation Study. 10% contamination

40 Monte Carlo Simulation A Service Quality Application MC Simulation Study. 30% contamination

41 RENFE Data Outline Monte Carlo Simulation A Service Quality Application 17 independing variables that present some measures of the RENFE(Public Railroad system in Spain) service. Station security Train cleanness Noise level...

42 RENFE Data Outline Monte Carlo Simulation A Service Quality Application 17 independing variables that present some measures of the RENFE(Public Railroad system in Spain) service. Station security Train cleanness Noise level... One dependent variable that corresponds with a measure of the global satisfaction of the customers with the service quality.

43 RENFE Data Outline Monte Carlo Simulation A Service Quality Application 17 independing variables that present some measures of the RENFE(Public Railroad system in Spain) service. Station security Train cleanness Noise level... One dependent variable that corresponds with a measure of the global satisfaction of the customers with the service quality. All variables were requested to evaluate on a 0-9 scale.

44 Monte Carlo Simulation A Service Quality Application The sample include 1499 questionnaires and are available in Computational times Algorithm PLS PLS-KurSD RSIMPLS Time(seg.)

45 Monte Carlo Simulation A Service Quality Application The sample include 1499 questionnaires and are available in The first principal component explains the 93.4% of the variability. Computational times Algorithm PLS PLS-KurSD RSIMPLS Time(seg.)

46 Conclusions Outline Conclusions Future work behaves well with any type of contamination and is easy to compute.

47 Conclusions Outline Conclusions Future work behaves well with any type of contamination and is easy to compute. is resistant to outliers even with a big percent of contamination.

48 Conclusions Outline Conclusions Future work behaves well with any type of contamination and is easy to compute. is resistant to outliers even with a big percent of contamination. is very fast an is useful in large data sets.

49 Future work Outline Conclusions Future work Extend the work to the case of several dependent variables.

50 Future work Outline Conclusions Future work Extend the work to the case of several dependent variables. To develop a Robust PLS with p n.

51 Future work Outline Conclusions Future work Extend the work to the case of several dependent variables. To develop a Robust PLS with p n. To analyze in which cases a robustification of the regression is necessary.

52 Conclusions Future work 4th Simposium of PLS and Related Methods Barcelona 7th-9th of September, 2005

Regression diagnostics

Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model