Robust Classification for Skewed Data

Size: px
Start display at page:

Download "Robust Classification for Skewed Data"

Transcription

1 Advances in Data Analysis and Classification manuscript No. (will be inserted by the editor) Robust Classification for Skewed Data Mia Hubert Stephan Van der Veeken Received: date / Accepted: date Abstract In this paper we propose a robust classification rule for skewed unimodal distributions. For low dimensional data, the classifier is based on minimizing the adjusted outlyingness to each group. In the case of high dimensional data, the robustified SIMCA method is adjusted for skewness. The robustness of the methods is investigated through different simulations and by applying it to some datasets. Keywords Robustness Classification Skewness. 1 Introduction Given a random sample from a group of k populations, constructing a rule to classify a new observation into one of the k populations is a widely studied problem, known as classification, discriminant analysis and supervised learning. Many of the proposed classification methods rely on quite strict distributional assumptions such as multivariate normality, or at least elliptical symmetry. Examples include the most popular ones, such as Fisher s linear discriminant rule (LDA) and classical quadratic discriminant analysis (CQDA) (Johnson and Wichern, 1998). A second drawback is that these classical procedures are sensitive towards outliers. Therefore, over the last years, several robustifications have been proposed. One group of methods relies on robust estimators of location and scatter, and consequently implicitly assume elliptical symmetry (Hubert and Van Driessen, 24; He and Fung, 2; Croux and Dehon, 21). We acknowledge financial support by the GOA/7/4-project of the Research Fund K.U.Leuven and by the IAP research network no. P6/3 of the Federal Science Policy, Belgium M. Hubert (corresponding author) Department of Mathematics - LStat, Katholieke Universiteit Leuven, Celestijnenlaan 2B, BE-31 Heverlee, Belgium mia.hubert@wis.kuleuven.be S. Van der Veeken Department of Mathematics - LStat, Katholieke Universiteit Leuven, Celestijnenlaan 2B, BE-31 Heverlee, Belgium

2 2 Proposals for skewed data are hardly available. An interesting approach is based on the concept of depth, and has been proposed in Ghosh and Chaudhuri (25). A new observation is classified to the population for which it attains maximal depth. Most common robust depth functions include Tukey s halfspace depth (Tukey, 1975) and Liu s simplicial depth (Liu, 199). The resulting classifiers achieve a good classification performance (especially when outliers are present) but they have the drawback that data outside the convex hull of every group will have zero depth, and then the classification becomes ambiguous. Moreover they are hard to compute. For example, the computation of the Tukey depth has O(n p 1 log n) time complexity (Rousseeuw and Struyf, 1998). For simplicial depth, an O(n log n) algorithm exists in p = 2 dimensions (Rousseeuw and Ruts, 1996), an O(n 2 ) algorithm in p = 3 dimensions and an O(n 4 ) algorithm in p = 4 dimensions (Cheng and Ouyang, 21). For higher dimensional spaces, there are no known algorithms faster than O(n p+1 ). In this paper we propose a classification rule based on the adjusted outlyingness, which is related to projection depth (Zuo and Serfling, 2). Hence, it can be seen as an extension of the projection depth classifiers studied in Dutta and Ghosh (29) for elliptical data. In low dimensions this leads to a robust classifier which can more easily be computed. Moreover for high-dimensional data it also allows for a generalization of the Soft Independent Modelling by Class Analogy method (SIMCA) (Wold, 1976). To construct our new method, we combine robust PCA for skewed data (Hubert et al., 29) with the robust RSIMCA method for elliptical data (Vanden Branden and Hubert, 25). In Section 2 we focus on the low-dimensional case. We introduce the new classification rule, report simulation results and apply our method on a real data set. Section 3 contains the methodology and the results for high-dimensional data, whereas Section 4 concludes. 2 Low dimensional data 2.1 Construction of the classification rule Outlyingness We assume we have sampled observations from k different classes X j, j = 1,..., k. The data belonging to group X j are denoted by x j i for i = 1,..., n j. The dimension of the data space is p and is assumed to be much smaller than the sample sizes. More comments on the dimension are given at the end of this section. Our new classification rule is defined as follows: for each new observation y to be classified, we first compute its adjusted outlyingness with respect to each group X j. Then we assign y to the group for which its adjusted outlyingness is minimal. The adjusted outlyingness is introduced in Brys et al. (25) and studied in detail in Hubert and Van der Veeken (28). It generalizes the Stahel-Donoho outlyingness towards skewed data. We first describe the concepts of outlyingness for an observation x j i coming from the class Xj = (x j 1,..., xj n j ) t. For univariate data x j i, the Stahel- Donoho outlyingness is defined as (Stahel, 1981; Donoho, 1982): SDO (1) (x j i, Xj ) = xj i med(xj ) s(x j )

3 3 where med(x j ) is the median of the data and s(x j ) is a robust scale estimate such as the MAD. For skewed distributions, a different scale is used on both sides of the median. First we measure the amount of skewness by means of the medcouple (MC), a robust measure of skewness (Brys et al., 24). For the jth (univariate) group, it is defined as MC(X j ) = med h(x j x j i <med(xj )<x j i, xj l ) (1) l with med(x j ) the sample median, and h(x i, x l ) = (xj l med(xj )) (med(x j ) x j i ) x j l. (2) xj i If MC(X j ) >, the adjusted outlyingness is defined as: AO (1) (x j i, Xj ) = x j i med(xj ) c 2 med(x j ) if x j i > med(xj ) med(x j ) x j i med(x j ) c 1 if x j i < med(xj ) where c 1 corresponds to the smallest observation greater than Q e 4 MC IQR and c 2 to the largest observation smaller than Q e 3 MC IQR. The notations Q 1 and Q 3 stand for the first and third quartile of the data, and IQR = Q 3 Q 1 is the interquartile range. If MC(X j ) <, the adjusted outlyingness is computed on the inverted data X j. In order to define the adjusted outlyingness for a multivariate data point x j i, the data are projected on all possible directions a and the AO (1) is computed. The overall AO j i is then defined as the supremum over all univariate AO s: AO j i = AO(xj i, Xj ) = sup a R p AO (1) (a t x j i, Xj a). (3) As the AO is based on robust measures of location, scale and skewness, it is resistant to outliers. In theory, a resistance up to 25% of outliers can be achieved, although we noticed in practice that the medcouple might become quite sensitive to contamination of more than 1%. Moreover, it can be shown that the influence function of the univariate AO (1) is bounded (Hubert and Van der Veeken, 28). Since in practice it is impossible to consider all possible directions a, we compute the AO (1) in m = 25p directions. This yields an overall computation time of O(mn log n) (using the fast algorithm O(n log n) algorithm to compute the MC (Brys et al., 24)). Random directions are generated as the direction perpendicular to the subspace spanned by p observations, randomly drawn from the data set. As such, the AO is invariant to affine transformations of the data. Note that this procedure can only be applied in our classification setting when p < n j, and when the dimension p is not too large (say p < 1). Otherwise taking 25p directions is insufficient and more directions are required to achieve good estimates. We do not consider this as an important drawback of our rule as it is well known that skewness is only an issue in small dimensions. As the dimensionality increases, the data are more and more concentrated in an outer shell of the distribution, see for example Hastie et al. (21). Of course, the algorithm can be easily adapted to search over more than m directions, but this will come at the cost of more computation time.

4 New robust classification method To apply our new classification rule, we now have to define the outlyingness of a new observation y w.r.t. each group X j. One approach would be to compute this outlyingness AO(y, ˇX j ) according to (3), with ˇX j the augmented data set ˇX j = (x j 1,..., xj n j, y) t. This would of course become computationally very demanding when many new observations need to be classified, as then the augmented data set is modified for each new observation and the median, the IQR and the medcouple have to be recomputed each time on every projection. Hence, we compute the outlyingness of y w.r.t. a fixed data set which does not include y. A natural candidate is of course X j itself. However, we obtain better results when we first remove the outliers from X j. As explained in Hubert and Van der Veeken (28), this can be easily performed by first computing the adjusted outlyingness of all observations AO j i in group Xj. Then the univariate outlyingness of every AO j i (i = 1,..., n j) is computed. Formally we can define the outlier score OS i = AO (1) (AO j i, {AOj i }). Observations with a too large outlyingness can be defined as those x j i for which AOj i > med(aoj i ) and OS i > 1 (or equivalently with AO j i > c 2). Those observations are removed from X j i, yielding X j i. To compute the outlyingness of a new case y, we then consider AO(y, X j ), so we fixed the median, IQR and medcouple of the projected outlier-free data from group j. The observation y is then assigned to the group for which its adjusted outlyingness is minimal. (In case of ties, which occurs with zero probability, we randomly assign the datum to one of the groups involved.) 2.2 Simulation results In order to illustrate the classification improvement by accounting for skewness, we performed a simulation study. In all simulation settings, clean data were generated from a multivariate skew-normal distribution (Azzalini and Dalla Valle, 1996). A p- dimensional random variable Z is said to be standard skew-normal distributed SN p ( Ω, α) if its density function is of the form f(z) = 2φ p(z; Ω)Φ(α t z) (4) where φ p(z; Ω) is the p-dimensional normal density with zero mean and correlation matrix Ω, Φ is the c.d.f. of the standard normal distribution and α is a p-dimensional vector that regulates the skewness. The mean vector and the covariance matrix are given by where 2 µ z = δ π cov(z) = Ω µ z µ t z (5) 1 δ = Ωα. 1 + α t Ωα (6) By adding a location vector µ and a scale matrix W = diag(ω 1,..., ω p ), with all ω j, we obtain the skew-normal random variable X = µ + W Z SN p (µ, Ω, α)

5 5 with Ω = W ΩW. Using the notation p = (,,..., ) t R p, note that if α = p, the skew-normal density reduces to the normal density. We also consider contaminated training data, in which ε = 5% or ε = 1% of the observations come from a normal distribution. First we consider the two-class problem (k = 2). With 1 p = (1, 1,..., 1) t R p, we can describe the different simulation settings as follows: 1. (a) p = 2, n 1 = n 2 = 25, X 1 SN 2 ( 2, I 2, ), X 2 SN 2 (2 1 2, I 2, ) (b) Outliers: ε = 1% observations from X 1 N 2 ( ,.2 I 2 ) 2. (a) p = 3, n 1 = n 2 = 5, X 1 SN 3 ( 3, I 3, ), X 2 SN 3 (2 1 3, I 3, ) (b) Outliers: ε = 1% observations from X 1 N 3 ( ,.2 I 3 ) 3. (a) p = 5, n 1 = n 2 = 5, X 1 SN 5 ( 5, I 5, ), X 2 SN 5 (1 5, I 5, ) (b) Outliers: ε = 1% observations from X 1 N 5 ( 1 5,.1 I 5 ) From each population we randomly generate n j training observations and n j /5 test data. On the training data we apply five classifiers: Classical Quadratic Discriminant Analysis (CQDA) Robust Linear Discriminant Analysis (RLDA) by means of the MCD (Hubert and Van Driessen, 24). Robust Quadratic Discriminant Analysis (RQDA) by means of the MCD (Hubert and Van Driessen, 24). our new classifier based on Adjusted Outlyingness (AO) Least squares Support Vector Machines (LS-SVM) classification (Suykens et al., 22) with radial basis function (RBF). The first three programs are available in the Matlab library LIBRA (Verboven and Hubert, 25), whereas the LS-SVM classifier is computed with LS-SVMlab ( (Suykens et al., 22). We have added the LS-SVM classifier since it is a popular classifier for many complex nonlinear problems. Because of the computational complexity of halfspace depth and simplicial depth and as they do not provide a classification in case of ties (equal depth to several groups), we did not include these maximum depth-based classifiers. The results of the simulations are summarized in terms of average misclassification errors. The misclassification error is defined as the overall proportion of wrongly classified observations in the test sets. The results listed in Table 1 are average misclassification errors with their respective standard errors over 1 simulations. In all situations we see that our new method outperforms the classical and robust methods that are based on elliptical symmetry. The performance is also slightly better than using the LS-SVM classifier. The main difference between these two approaches however lies in the computation time. Whereas to apply our method based on the adjusted outlyingness, we only need to choose the number of directions a in (3) for computational reasons, the LS-SVM method requires the choice of two hyperparameters (the width of the RBF kernel, and the regularization constant). If a 1-fold cross-validation approach is followed as implemented in the LS-SVMlab, we need e.g seconds to perform the classification problem (2a) with LS-SVM. On the other hand the AO-classifier only needs 7.4 seconds when 25p = 75 directions are considered. Note that to decrease the large computation time of LS-SVM in our simulations, for each simulation setting, we have first estimated the hyperparameters on 1 training data sets and we computed their average value. We then performed the 1 replications with these fixed values for the hyperparameters. We checked for setting (1a) that this

6 6 Table 1 Simulation results for the two-class problem on asymmetric distributions. ε CQDA RLDA RQDA AO LS-SVM 2D % (.12) (.13) (.13) (.8) (.1) 1% (.18) (.14) (.14) (.1) (.1) 3D % (.9) (.11) (.9) (.6) (.7) 1% (.9) (.1) (.11) (.6) (.1) 5D % (.13) (.13) (.14) (.12) (.11) 1% (.12) (.12) (.12) (.14) (.9) procedure did not give a significant different result (.9 (s.e..1) with fixed values for the hyperparameters versus.89 (s.e..1) with different hyperparameters). We also simulated some symmetric distributions. For this, we used the following setting: 4. (a) p = 2, n 1 = n 2 = 25, X 1 N 2 ( 2, I 2 ), X 2 N 2 (2 1 2, I 2 ) (b) Outliers: ε = 1% observations from X 1 N 2 (1 1 2,.2 I 2 ) 5. (a) p = 3, n 1 = n 2 = 5, X 1 N 3 ( 3, I 3 ), X 2 N 3 (2 1 3, I 3 ) (b) Outliers: ε = 1% observations from X 1 N 3 (1 1 3,.2 I 3 ) 6. (a) p = 5, n 1 = n 2 = 5, X 1 N 5 ( 5, I 5 ), X 2 N 5 (1 5, I 5 ) (b) Outliers: ε = 1% observations from X 1 N 5 (5 1 5,.1 I 5 ) The results listed in Table 2 are again the average misclassification errors with their respective standard errors over 1 simulations. Here, we notice that the average misclassification error for our new method is somewhat higher than for the other robust methods which are designed for elliptical distributions. But contrary to CQDA, we see that our AO method is not highly affected by outliers. Note that the optimal error rates are.787 for the 2D-setting,.416 for the 3D-setting and.1318 for the 5D-setting. The results obtained by the robust methods RLDA and RQDA lie quite close to these optimal values. Table 3 lists the results for the three-class problem (k = 3). As LS-SVM cannot directly cope with more than two groups, we have not included it in our comparison. Here, the data from the first two groups were generated as before, and a third group was added. Outliers are now generated in each group. 7. (a) p = 2, n 3 = 25, X 3 SN 2 ((3, 1) t, I 2, (5, 5) t ) (b) Outliers: 5% observations from X 1 N 2 ( 2 1 2,.2 I 2 ), 5% of X 2 N 2 ((8, ) t,.2 I 2 ) and 5% of X 3 N 2 ((, 8) t,.2 I 2 ) (c) Idem as (b) but with 1% outliers in each group. 8. (a) p = 3, n 3 = 5, X 3 SN 3 ((3, 1, 3) t, I 3, (5, 5, 5) t )

7 7 Table 2 Simulation results for the two-class problem on symmetric distributions. ε CQDA RLDA RQDA AO LS-SVM 2D % (.28) (.27) (.26) (.29) (.24) 1% (.52) (.26) (.26) (.26) (.25) 3D % (.15) (.14) (.14) (.15) (.15) 1% (.36) (.14) (.14) (.14) (.14) 5D % (.25) (.25) (.24) (.26) (.24) 1% (.38) (.25) (.25) (.25) (.25) (b) Outliers: 5% observations from X 1 N 3 ( 2 1 3,.1 I 3 ), 5% of X 2 N 3 ((6,, 1) t,.1 I 3 ) and 5% of X 3 N 3 ((, 6, 1) t,.1 I 3 ) (c) Idem as (b) but with 1% outliers in each group. 9. (a) p = 5, n 3 = 5, X 3 SN 5 (.5 1 5, I 5, ) (b) Outliers: 5% observations from X 1 N 5 ( ,.1 I 5 ), 5% of X 2 N 5 (3 1 5,.1 I 5 ) and 5% of X 3 N 5 ((4, 4, 4,, ) t,.1 I 5 ) (c) Idem as (b) but with 1% outliers in each group. Also here we can conclude that we achieve lower misclassification errors by adjusting for skewness. In Figure 1 we illustrate the different classifiers by plotting the data generated in setting 7(b). The white region marks the set of test data that will be classified in class 3, the light-colored region indicates the data that will be classified in class 1, whereas the dark-colored region visualizes the classification region of class 2. We see in Figure 1(a) that CQDA does not well separate the regular observations in the three groups, as the classification rules are attracted by the outliers. The robust methods RLDA in Figure 1(b) and RQDA in Figure 1(c) are clearly less influenced by the outlying cases, but still misclassify some observations that lie in between the groups. The AO classifier shown in Figure 1(d) yields very precise classification boundaries. 2.3 Example The data used in this example come from the Belgian Household Survey of 25. The Household Survey is a multi-purpose continuous survey carried out by the Social Survey Division of the Belgian National Institute of Statistics which collects information on people living in private households in Belgium. The main aim of the survey is to collect data on a range of topics such as education, welfare, family structure and health. One of the most important direct applications of the survey is the computation of the evolution of the purchasing power. Therefore, quite some data on income and expenditure are collected. We selected a subset of 11 variables from the data set. The

8 8 Table 3 Simulation results for the three-class problem on asymmetric distributions. ε CQDA RLDA RQDA AO 2D % (.1) (.11) (.12) (.12) 5% (.15) (.13) (.11) (.11) 1% (.11) (.11) (.12) (.18) 3D % (.6) (.8) (.8) (.8) 5% (.1) (.6) (.6) (.7) 1% (.14) (.9) (.8) (.8) 5D % (.6) (.8) (.7) (.7) 5% (.8) (.8) (.8) (.7) 1% (.7) (.7) (.6) (.8) most important variable is the yearly income of the correspondent. The other variables are all expenditures on different types of goods such as health, food and leisure. It is generally known that the percentage of income spent on a specific type of good is a function of the yearly revenue of an individual. A good is called a luxury good if the relative amount of income spent on it is an increasing function of the income. If the function is constant, one deals with normal goods. In the case of inferior goods the function decreases. This means that the spending pattern changes with the yearly income of a person. In order to avoid correcting factors for family size, only single persons are considered. This group of single persons consists of 174 unemployed and 76 (at least partially) employed persons. With our analysis we try to determine whether a person is employed or not by looking at his or her spending pattern and income. In this section we only consider the variables Income and Expenditure on durable consumer goods (which are luxury goods). Figure 2(a) is a scatterplot of the data indicating both groups. As the group of employed people is highly spread out, we have also plotted the lower left part of the data (Figure 2(b)), in which both groups can be better distinguished. We notice the skewness in both classes, as well as some overlap between the groups. Both groups are 1 times randomly split into a training set and a test set which contains 2% of the data. CQDA yields an average misclassification error of.1914 with a standard error of.16. Due to the fact that there are quite some outliers, RQDA improves this result to.1617 with a standard error of.17. Our skewness adjusted robust classifier yields the smallest misclassification error of.1394 with a standard deviation of.85.

9 9 CQDA RLDA (a) AO (b) RQDA (c) (d) Fig. 1 Class acceptance regions for the three-classes simulation data (case 7(b)) for the four classifiers : (a) Classical Quadratic Discriminant Analysis (CQDA), (b) Robust Linear Discriminant Analysis (RLDA), (c) Robust Quadratic Discriminant Analysis (RQDA), (d) Adjusted Outlyingness (AO). 3 High dimensional data 3.1 Construction of the classification rule To classify high-dimensional data, the analysis often starts with applying a dimension reduction technique such as Principal Component Analysis (PCA) on the whole data set. Next, discriminant analysis for low-dimensional data is applied on the resulting PCA-scores, see e.g. Hubert and Engelen (24), where a robust PCA is followed by RQDA to classify mice with different therapies for cancer based on high-dimensional NMR spectra. However, it is not always useful to perform one single PCA on the entire data set, as the optimal subspaces and the dimensionality of the different classes X j

10 1 EXPENDITURE ON DURABLE CONSUMER GOODS 1 x employed unemployed.5 1 INCOME x x 14 (a) 5 employed unexmployed EXPENDITURE ON DURABLE CONSUMER GOODS INCOME x 1 4 (b) Fig. 2 Scatterplot of expenditure on durable consumer goods versus income: (a) complete data set; (b) observations with a yearly income smaller than 5 euro and a yearly expenditure on durable goods smaller than 6 euro. can be different. A way to deal with such classes of different dimensionality is the so called Soft Independent Modelling by Class Analogy (SIMCA) method. The idea behind SIMCA (Wold, 1976) is to apply a (classical) PCA in each group, thereby retaining within each group a sufficient number of principal components to account for most of the variation. For each new observation y, we can then compute its projection within the PCA-subspace of group j, denoted by ŷ j. Roughly speaking, the SIMCA classification rule is then obtained by combining the orthogonal (Euclidean) distance (OD) of y to the jth class OD j (y) = y ŷ j (7)

11 11 and the score distance (SD). The score distance is the Mahalanobis distance measured in the PCA-subspace. For a new observation y, the score distance with respect to the jth class is given by: SD j (y) = (t j ) t (L j ) 1 t j Here t j = (P j ) t (y x j ) is the score of y with respect to the jth class, P j the matrix of the loadings of group j, L j the diagonal matrix of the eigenvalues and x j the empirical mean of the observations of X j. Since the SIMCA method is based on classical PCA, it is inherently sensitive to outliers. In order to robustify SIMCA, the RSIMCA classifier has been developed (Hubert et al., 25), which is mostly based on the robust PCA method ROBPCA (Hubert et al. 25). This ROBPCA method uses concepts from the Stahel-Donoho outlyingness and the MCD estimator, and consequently it is mostly suited for elliptically distributed data. As we want to account for skewness, we propose a classification method which is similar to RSIMCA, but which is based on a robust PCA approach for skewed data (Hubert et al. 29) as well as on the adjusted outlyingness (3). The method, denoted as RSIMCA-AO, consists of the following steps: 1. First we apply ROBPCA for skewed data within each group. To this end, (a) we compute the AO of all observations in X j. Then we take the set I h of the h j (with [n j /2] h j n j ) data points with smallest AO, and we compute their mean ˆµ (X j ) and covariance matrix ˆΣ (X j ). Note that h j should be smaller than the number of non-outlying objects in X j. In our examples we have always set h j =.9n j but this can of course be modified if a smaller number of regular observations is too be expected. (b) In each group we reduce the dimension by projecting all data on the l j -dimensional subspace V j spanned by the first l j eigenvectors of ˆΣ (X j ). (c) For each observation x j x i, we compute its orthogonal distance ODj i = j i ˆxj i with ˆx j i the projection of x j i on the subspace V j. We then obtain an improved robust subspace estimate V j 1 as the subspace spanned by the l j dominant eigenvectors of ˆΣ1, which is the covariance matrix of all observations x j i for which the OD j i cj OD. The cutoff value cj OD is the largest ODj i smaller than Q 3 {OD} e 3 MC{OD} IQR{OD}. Here {OD} is the set {OD j i, i = 1,..., n j }. Next, we project all data points of all groups on their respective PCA subspace V j 1, yielding the final ˆxj i. (d) We compute the AO j i of the ˆxj i. Similar to the low-dimensional case, we retain the estimates of the median, MAD and IQR in each direction. 2. To classify a new observation y, we consider two distances. First, we look at the orthogonal Euclidean distance OD j (y), as in (7), of y to the subspace V j 1. Next, we compute the adjusted outlyingness AO of ŷ j with respect to the ˆx j i. As explained in Section 2.1 we reduce the computation time by using the results from deriving the AO j i. To make sure that none of these two distances dominates the other, both measures are rescaled. For the OD we use c j OD, whereas for the AO we define c j AO, with {AO} = {AOj i, i = 1,..., n j}, as the largest AO j i smaller than Q 3 {AO} + 1.5e 3 MC{AO} IQR{AO}. Finally, similarly as in Vanden Branden and Hubert (25), we consider a linear combination of the two squared rescaled

12 12 distances ( ) 2 ( ) 2 OD j (y) AO j (y) γ c j + (1 γ) OD c j. (8) AO Observation y is now classified into the group j for which (8) is minimal. The parameter γ [, 1] determines the relative weight that is given to both distance measures. If there is no prior information on the relative importance of both measures, γ can be chosen in such a way that the misclassification error is minimized. Remark: The computation of the AO in step 1(a) is carried out in the space spanned by the datapoints. When p > n j, a preliminary SVD decomposition of X j leads to a dataspace of dimension r n j 1. As r will generally still be large, the computation of the AOs can no longer be done as in the low-dimensional setting. First note that in the PCA-setting, we further assume that the data can be well represented in a lower dimensional space k r, so the intrinsic problem is still low-dimensional. Secondly, we do not need an affine equivariant AO-measure anymore, but only a procedure which is orthogonally equivariant. Therefore, we generate m = min(25p, 25) directions a through 2 datapoints and then take the maximum of the AO (1) s on those projections. 3.2 Simulation results To illustrate the effectiveness of the skewness adjustment, RSIMCA-AO is compared to SIMCA and RSIMCA method in a simulation study. First we consider an outlier-free situation. In p = 5 dimensions, three groups of observations are generated from a skew normal distribution. The first group has its center at p=(,,..., ) t R p. The covariance matrix has the three first canonical vectors as dominant directions, and they explain 98% of the variability. The skewness vector α is set to (2, p 1 ) t. The second group has its center in (5, p 1 ) t and has only two important directions (the first and the second canonical vector) yielding 94% explanation of the variance. The skewness parameter equals (1, p 1 ) t. The center of the third class is located at ( 5, p 1 ) t. Its first four directions explain 95% of the variance, while its skewness parameter is ( 1, p 1 ) t. From each group we generate 4 training data and 8 validation cases. Table 4 summarizes the results of applying the three different classifiers. Average misclassification errors (and corresponding standard errors) are reported over 1 simulations, for 11 equidistant γ values in [, 1]. Note that all methods require to choose the number of principal components within each group. Here, we have used l 1 = 3, l 2 = 2 and l 3 = 4 according to the true dimensionality of the data. We see that, in this outlier-free setting, the performance of SIMCA and RSIMCA is very comparable. RSIMCA-AO gives better results over a broad range of γ-values and attains the minimal classification error of.484 at γ =.3. The result for γ =.2 is very close with an error of.488. In order to illustrate the robustness towards outliers, we next introduced 5% orthogonal outliers in the first group and the second group and 5% bad leverage points in the third group (for a precise definition of the types of outliers, see Hubert et al. (29)). The outliers follow the same distributions as the regular observations but their center is changed to (,,, 5, 496 ) t for the first group, (5,, 5, 497 ) t for the second group and (,,,, 1, 495 ) t for the last group. It is known that orthogonal contamination lifts the classical PCA subspace towards the outliers and that bad leverage points tilt the subspace to accommodate all the outliers. From Table 5 we see indeed

13 13 that the contamination has a high impact on the performance of classical SIMCA. The robust methods are quite resistant towards outliers. Again RSIMCA-AO obtains the smallest misclassification error, with the minimal value of.511 at γ =.2. Table 4 Simulation results for uncontaminated high-dimensional data. γ SIMCA RSIMCA RSIMCA-AO (.24) (.24) (.3) (.16) (.17) (.22) (.15) (.14) (.14) (.14) (.14) (.14) (.14) (.14) (.17) (.17) (.16) (.24) (.2) (.21) (.24) (.19) (.19) (.33) (.18) (.19) (.24) (.6) (.1) (.13) (.6) (.5) (.4) 3.3 Example We reconsider the social data example from Section 2.3 but now we take all 11 variables into account: Income and expenditures on clothing, alcoholic drinks, durable consumer goods, energy, food, health, leisure, nonalcoholic drinks, transport and housing. On this data set we apply SIMCA, RSIMCA and RSIMCA-AO. For each method we retained three principal components in each group. For RSIMCA-AO e.g. this explained 87% percentage of the variation of the first group, and 89% of the second group. On Figure 3 the average misclassification errors are plotted as a function of γ (based on 1 random splits into training and test set). It can be clearly seen that the skewness adaptation gives smaller misclassification errors for all values of γ.

14 14 Table 5 Simulation results for contaminated high-dimensional data. γ SIMCA RSIMCA RSIMCA-AO (.22) (.23) (.28) (.25) (.18) (.19) (.21) (.16) (.15) (.19) (.16) (.17) (.22) (.14) (.16) (.23) (.18) (.22) (.25) (.19) (.24) (.29) (.19) (.28) (.15) (.17) (.27) (.7) (.7) (.23) (.7) (.5) (.5).22.2 SIMCA RSIMCA RSIMCA AO.18 Misclassification γ Fig. 3 Average misclassification errors for the social data as a function of γ, with different classification methods. 4 Conclusion and outlook In this paper we have shown how classical and robust classifiers for elliptical data can be adjusted for skewness. In the low-dimensional case we adapt the maximum depth

15 15 classification method by using the concept of adjusted outlyingness. This is a fast projection pursuit procedure which does not need any tuning parameter. Only for its computation, the number of projections needs to be chosen. Note that when the sample sizes are highly different among the classes, some modifications to this classification rule can be made, leading to even smaller misclassification errors. This has been studied in Hubert and Van der Veeken (21). For high-dimensional data we modify the RSIMCA approach by using a robust PCA method for skewed data for dimension reduction and the AO within the PCA subspaces. Simulation results and an application on a real data set show the benefits of these new procedures. The programs will become available as part of LIBRA (Verboven and Hubert, 25) at wis.kuleuven.be/stat/robust. References Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83: Brys G, Hubert M, Rousseeuw PJ (25) A robustification of Independent Component Analysis. Journal of Chemometrics 19: Brys G, Hubert M, Struyf A (24) A robust measure of skewness. Journal of Computational and Graphical Statistics 13: Cheng AY, Ouyang M (21) On algorithms for simplicial depth. In: Proceedings 13th Canadian Conference on Computational Geometry. pp Croux C, Dehon C (21) Robust linear discriminant analysis using S-estimators. The Canadian Journal of Statistics 29: Donoho DL (1982) Breakdown properties of multivariate location estimators. PhD thesis, Harvard University Dutta S, Ghosh AK (29) On robust classification using projection depth. Indian Statistical Institute, Technical report R11/29 Ghosh AK, Chaudhuri P (25) On maximum depth and related classifiers. Scandinavian Journal of Statistics: Theory and applications 32(2): Hastie T, Tibshirani R, Friedman J (21) The Elements of Statistical Learning. Springer, New York He X, Fung WK (2) High breakdown estimate for multiple populations with applications to discriminant analysis. Journal of Multivariate Analysis 72: Hubert M, Engelen S (24) Robust PCA and classification in biosciences. Bioinformatics 2: Hubert M, Rousseeuw PJ, Vanden Branden K (25) ROBPCA: a new approach to robust principal component analysis. Technometrics 47:64-79 Hubert M, Rousseeuw PJ, Verdonck T (29) Robust PCA for skewed data. Computational Statistics and Data Analysis 53: Hubert M, Van der Veeken S (28) Outlier detection for skewed data. Journal of Chemometrics 22: Hubert M, Van der Veeken S (21) Fast and robust classifiers adjusted for skewness. In: Proceedings of Compstat 21. Springer Hubert M, Van Driessen K (24) Fast and robust discriminant analysis. Computational Statistics and Data Analysis 45:31-32 Johnson RA, Wichern DW (1998) Applied multivariate statistical analysis. Prentice Hall Inc., Englewood Cliffs, NJ

16 16 Liu RY (199) On a notion of data depth based on random simplices. The Annals of Statistics 18(1): Rousseeuw PJ, Ruts I (1996) Bivariate location depth. Applied Statistics 45: Rousseeuw PJ, Struyf A (1998) Computing location depth and regression depthin higher dimensions. Statistics and Computing 8: Stahel WA (1981) Robuste schätzungen: infinitesimale optimalität und schätzungen von kovarianzmatrizen. PhD thesis, ETH Zürich Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (22) Least Squares Support Vector Machines. World Scientific, Singapore Tukey JW (1975) Mathematics and picturing of data. In: Proceedings of the International Congress of Mathematicians, Volume 2: Vanden Branden K, Hubert M (25) Robust classification in high dimensions based on the SIMCA method. Chemometrics and Intelligent Laboratory Systems 79:1-21 Verboven S, Hubert M (25) LIBRA: a Matlab library for robust analysis. Chemometrics and Intelligent Laboratory Systems 75: Wold S (1976) Pattern recognition by means of disjoint principal component models. Pattern Recognition 8: Zuo Y, Serfling R (2) General notions of statistical depth function. The Annals of Statistics 28:

Fast and Robust Classifiers Adjusted for Skewness

Fast and Robust Classifiers Adjusted for Skewness Fast and Robust Classifiers Adjusted for Skewness Mia Hubert 1 and Stephan Van der Veeken 2 1 Department of Mathematics - LStat, Katholieke Universiteit Leuven Celestijnenlaan 200B, Leuven, Belgium, Mia.Hubert@wis.kuleuven.be

More information

Outlier detection for skewed data

Outlier detection for skewed data Outlier detection for skewed data Mia Hubert 1 and Stephan Van der Veeken December 7, 27 Abstract Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 53 (2009) 2264 2274 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Robust

More information

FAST CROSS-VALIDATION IN ROBUST PCA

FAST CROSS-VALIDATION IN ROBUST PCA COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 FAST CROSS-VALIDATION IN ROBUST PCA Sanne Engelen, Mia Hubert Key words: Cross-Validation, Robustness, fast algorithm COMPSTAT 2004 section: Partial

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

Robust estimation of principal components from depth-based multivariate rank covariance matrix

Robust estimation of principal components from depth-based multivariate rank covariance matrix Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary

More information

Sparse PCA for high-dimensional data with outliers

Sparse PCA for high-dimensional data with outliers Sparse PCA for high-dimensional data with outliers Mia Hubert Tom Reynkens Eric Schmitt Tim Verdonck Department of Mathematics, KU Leuven Leuven, Belgium June 25, 2015 Abstract A new sparse PCA algorithm

More information

Detecting outliers in weighted univariate survey data

Detecting outliers in weighted univariate survey data Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,

More information

Fast and Robust Discriminant Analysis

Fast and Robust Discriminant Analysis Fast and Robust Discriminant Analysis Mia Hubert a,1, Katrien Van Driessen b a Department of Mathematics, Katholieke Universiteit Leuven, W. De Croylaan 54, B-3001 Leuven. b UFSIA-RUCA Faculty of Applied

More information

Stahel-Donoho Estimation for High-Dimensional Data

Stahel-Donoho Estimation for High-Dimensional Data Stahel-Donoho Estimation for High-Dimensional Data Stefan Van Aelst KULeuven, Department of Mathematics, Section of Statistics Celestijnenlaan 200B, B-3001 Leuven, Belgium Email: Stefan.VanAelst@wis.kuleuven.be

More information

Supplementary Material for Wang and Serfling paper

Supplementary Material for Wang and Serfling paper Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness

More information

Damage detection in the presence of outliers based on robust PCA Fahit Gharibnezhad 1, L.E. Mujica 2, Jose Rodellar 3 1,2,3

Damage detection in the presence of outliers based on robust PCA Fahit Gharibnezhad 1, L.E. Mujica 2, Jose Rodellar 3 1,2,3 Damage detection in the presence of outliers based on robust PCA Fahit Gharibnezhad 1, L.E. Mujica 2, Jose Rodellar 3 1,2,3 Escola Universitària d'enginyeria Tècnica Industrial de Barcelona,Department

More information

The S-estimator of multivariate location and scatter in Stata

The S-estimator of multivariate location and scatter in Stata The Stata Journal (yyyy) vv, Number ii, pp. 1 9 The S-estimator of multivariate location and scatter in Stata Vincenzo Verardi University of Namur (FUNDP) Center for Research in the Economics of Development

More information

Evaluation of robust PCA for supervised audio outlier detection

Evaluation of robust PCA for supervised audio outlier detection Evaluation of robust PCA for supervised audio outlier detection Sarka Brodinova, Vienna University of Technology, sarka.brodinova@tuwien.ac.at Thomas Ortner, Vienna University of Technology, thomas.ortner@tuwien.ac.at

More information

Robust Linear Discriminant Analysis and the Projection Pursuit Approach

Robust Linear Discriminant Analysis and the Projection Pursuit Approach Robust Linear Discriminant Analysis and the Projection Pursuit Approach Practical aspects A. M. Pires 1 Department of Mathematics and Applied Mathematics Centre, Technical University of Lisbon (IST), Lisboa,

More information

Multivariate quantiles and conditional depth

Multivariate quantiles and conditional depth M. Hallin a,b, Z. Lu c, D. Paindaveine a, and M. Šiman d a Université libre de Bruxelles, Belgium b Princenton University, USA c University of Adelaide, Australia d Institute of Information Theory and

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Statistical Depth Function

Statistical Depth Function Machine Learning Journal Club, Gatsby Unit June 27, 2016 Outline L-statistics, order statistics, ranking. Instead of moments: median, dispersion, scale, skewness,... New visualization tools: depth contours,

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Fast and robust bootstrap for LTS

Fast and robust bootstrap for LTS Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4: Measures of Robustness, Robust Principal Component Analysis

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4: Measures of Robustness, Robust Principal Component Analysis MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4:, Robust Principal Component Analysis Contents Empirical Robust Statistical Methods In statistics, robust methods are methods that perform well

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Evaluation of robust PCA for supervised audio outlier detection

Evaluation of robust PCA for supervised audio outlier detection Institut f. Stochastik und Wirtschaftsmathematik 1040 Wien, Wiedner Hauptstr. 8-10/105 AUSTRIA http://www.isw.tuwien.ac.at Evaluation of robust PCA for supervised audio outlier detection S. Brodinova,

More information

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI ** ANALELE ŞTIINłIFICE ALE UNIVERSITĂłII ALEXANDRU IOAN CUZA DIN IAŞI Tomul LVI ŞtiinŃe Economice 9 A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI, R.A. IPINYOMI

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux Pliska Stud. Math. Bulgar. 003), 59 70 STUDIA MATHEMATICA BULGARICA DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION P. Filzmoser and C. Croux Abstract. In classical multiple

More information

ROBUST PARTIAL LEAST SQUARES REGRESSION AND OUTLIER DETECTION USING REPEATED MINIMUM COVARIANCE DETERMINANT METHOD AND A RESAMPLING METHOD

ROBUST PARTIAL LEAST SQUARES REGRESSION AND OUTLIER DETECTION USING REPEATED MINIMUM COVARIANCE DETERMINANT METHOD AND A RESAMPLING METHOD ROBUST PARTIAL LEAST SQUARES REGRESSION AND OUTLIER DETECTION USING REPEATED MINIMUM COVARIANCE DETERMINANT METHOD AND A RESAMPLING METHOD by Dilrukshika Manori Singhabahu BS, Information Technology, Slippery

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS

FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS Int. J. Appl. Math. Comput. Sci., 8, Vol. 8, No. 4, 49 44 DOI:.478/v6-8-38-3 FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS YVON THARRAULT, GILLES MOUROT, JOSÉ RAGOT, DIDIER MAQUIN

More information

Introduction to robust statistics*

Introduction to robust statistics* Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate

More information

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition A Simple Implementation of the Stochastic Discrimination for Pattern Recognition Dechang Chen 1 and Xiuzhen Cheng 2 1 University of Wisconsin Green Bay, Green Bay, WI 54311, USA chend@uwgb.edu 2 University

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1 Jun Li 2, Juan A. Cuesta-Albertos 3, Regina Y. Liu 4 Abstract Using the DD-plot (depth-versus-depth plot), we introduce a new nonparametric

More information

ISSN X On misclassification probabilities of linear and quadratic classifiers

ISSN X On misclassification probabilities of linear and quadratic classifiers Afrika Statistika Vol. 111, 016, pages 943 953. DOI: http://dx.doi.org/10.1699/as/016.943.85 Afrika Statistika ISSN 316-090X On misclassification probabilities of linear and quadratic classifiers Olusola

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Robust methods for multivariate data analysis

Robust methods for multivariate data analysis JOURNAL OF CHEMOMETRICS J. Chemometrics 2005; 19: 549 563 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cem.962 Robust methods for multivariate data analysis S. Frosch

More information

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath.

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath. TITLE : Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath Department of Mathematics and Statistics, Memorial University

More information

Review of robust multivariate statistical methods in high dimension

Review of robust multivariate statistical methods in high dimension Review of robust multivariate statistical methods in high dimension Peter Filzmoser a,, Valentin Todorov b a Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr.

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD

A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD Sarah Franklin, Marie Brodeur, Statistics Canada Sarah Franklin, Statistics Canada, BSMD, R.H.Coats Bldg, 1 lth floor, Ottawa,

More information

Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions

Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions Robert Serfling 1 and Satyaki Mazumder 2 University of Texas at Dallas and Indian Institute of Science, Education

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

Robust Exponential Smoothing of Multivariate Time Series

Robust Exponential Smoothing of Multivariate Time Series Robust Exponential Smoothing of Multivariate Time Series Christophe Croux,a, Sarah Gelper b, Koen Mahieu a a Faculty of Business and Economics, K.U.Leuven, Naamsestraat 69, 3000 Leuven, Belgium b Erasmus

More information

Robust Linear Discriminant Analysis

Robust Linear Discriminant Analysis Journal of Mathematics and Statistics Original Research Paper Robust Linear Discriminant Analysis Sharipah Soaad Syed Yahaya, Yai-Fung Lim, Hazlina Ali and Zurni Omar School of Quantitative Sciences, UUM

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Inference based on robust estimators Part 2

Inference based on robust estimators Part 2 Inference based on robust estimators Part 2 Matias Salibian-Barrera 1 Department of Statistics University of British Columbia ECARES - Dec 2007 Matias Salibian-Barrera (UBC) Robust inference (2) ECARES

More information

Re-weighted Robust Control Charts for Individual Observations

Re-weighted Robust Control Charts for Individual Observations Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 426 Re-weighted Robust Control Charts for Individual Observations Mandana Mohammadi 1, Habshah Midi 1,2 and Jayanthi Arasan 1,2 1 Laboratory of Applied

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Robust scale estimation with extensions

Robust scale estimation with extensions Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth

Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth Chong Ma 1 David B. Hitchcock 2 1 PhD Candidate University of South Carolina 2 Associate Professor

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

A New Robust Partial Least Squares Regression Method

A New Robust Partial Least Squares Regression Method A New Robust Partial Least Squares Regression Method 8th of September, 2005 Universidad Carlos III de Madrid Departamento de Estadistica Motivation Robust PLS Methods PLS Algorithm Computing Robust Variance

More information

Minimum Covariance Determinant and Extensions

Minimum Covariance Determinant and Extensions Minimum Covariance Determinant and Extensions Mia Hubert, Michiel Debruyne, Peter J. Rousseeuw September 22, 2017 arxiv:1709.07045v1 [stat.me] 20 Sep 2017 Abstract The Minimum Covariance Determinant (MCD)

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

Robust Estimation of Cronbach s Alpha

Robust Estimation of Cronbach s Alpha Robust Estimation of Cronbach s Alpha A. Christmann University of Dortmund, Fachbereich Statistik, 44421 Dortmund, Germany. S. Van Aelst Ghent University (UGENT), Department of Applied Mathematics and

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Sufficient Dimension Reduction using Support Vector Machine and it s variants

Sufficient Dimension Reduction using Support Vector Machine and it s variants Sufficient Dimension Reduction using Support Vector Machine and it s variants Andreas Artemiou School of Mathematics, Cardiff University @AG DANK/BCS Meeting 2013 SDR PSVM Real Data Current Research and

More information

Robust multivariate methods in Chemometrics

Robust multivariate methods in Chemometrics Robust multivariate methods in Chemometrics Peter Filzmoser 1 Sven Serneels 2 Ricardo Maronna 3 Pierre J. Van Espen 4 1 Institut für Statistik und Wahrscheinlichkeitstheorie, Technical University of Vienna,

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Compactly supported RBF kernels for sparsifying the Gram matrix in LS-SVM regression models

Compactly supported RBF kernels for sparsifying the Gram matrix in LS-SVM regression models Compactly supported RBF kernels for sparsifying the Gram matrix in LS-SVM regression models B. Hamers, J.A.K. Suykens, B. De Moor K.U.Leuven, ESAT-SCD/SISTA, Kasteelpark Arenberg, B-3 Leuven, Belgium {bart.hamers,johan.suykens}@esat.kuleuven.ac.be

More information

Identification of Multivariate Outliers: A Performance Study

Identification of Multivariate Outliers: A Performance Study AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Cellwise robust regularized discriminant analysis

Cellwise robust regularized discriminant analysis Cellwise robust regularized discriminant analysis JSM 2017 Stéphanie Aerts University of Liège, Belgium Ines Wilms KU Leuven, Belgium Cellwise robust regularized discriminant analysis 1 Discriminant analysis

More information

Fast Bootstrap for Least-square Support Vector Machines

Fast Bootstrap for Least-square Support Vector Machines ESA'2004 proceedings - European Symposium on Artificial eural etworks Bruges (Belgium), 28-30 April 2004, d-side publi., ISB 2-930307-04-8, pp. 525-530 Fast Bootstrap for Least-square Support Vector Machines

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

6-1. Canonical Correlation Analysis

6-1. Canonical Correlation Analysis 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another

More information

Minimum Regularized Covariance Determinant Estimator

Minimum Regularized Covariance Determinant Estimator Minimum Regularized Covariance Determinant Estimator Honey, we shrunk the data and the covariance matrix Kris Boudt (joint with: P. Rousseeuw, S. Vanduffel and T. Verdonck) Vrije Universiteit Brussel/Amsterdam

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Small sample size in high dimensional space - minimum distance based classification.

Small sample size in high dimensional space - minimum distance based classification. Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc

More information

A Peak to the World of Multivariate Statistical Analysis

A Peak to the World of Multivariate Statistical Analysis A Peak to the World of Multivariate Statistical Analysis Real Contents Real Real Real Why is it important to know a bit about the theory behind the methods? Real 5 10 15 20 Real 10 15 20 Figure: Multivariate

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS Richard Brereton r.g.brereton@bris.ac.uk Pattern Recognition Book Chemometrics for Pattern Recognition, Wiley, 2009 Pattern Recognition Pattern Recognition

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Discriminant analysis for compositional data and robust parameter estimation

Discriminant analysis for compositional data and robust parameter estimation Noname manuscript No. (will be inserted by the editor) Discriminant analysis for compositional data and robust parameter estimation Peter Filzmoser Karel Hron Matthias Templ Received: date / Accepted:

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

INVARIANT COORDINATE SELECTION

INVARIANT COORDINATE SELECTION INVARIANT COORDINATE SELECTION By David E. Tyler 1, Frank Critchley, Lutz Dümbgen 2, and Hannu Oja Rutgers University, Open University, University of Berne and University of Tampere SUMMARY A general method

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

Chapter 3 Examining Data

Chapter 3 Examining Data Chapter 3 Examining Data This chapter discusses methods of displaying quantitative data with the objective of understanding the distribution of the data. Example During childhood and adolescence, bone

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

CLASSIFICATION EFFICIENCIES FOR ROBUST LINEAR DISCRIMINANT ANALYSIS

CLASSIFICATION EFFICIENCIES FOR ROBUST LINEAR DISCRIMINANT ANALYSIS Statistica Sinica 8(008), 58-599 CLASSIFICATION EFFICIENCIES FOR ROBUST LINEAR DISCRIMINANT ANALYSIS Christophe Croux, Peter Filzmoser and Kristel Joossens K.U. Leuven and Vienna University of Technology

More information