Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Size: px

Start display at page:

Download "Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation"

Anastasia McCarthy
5 years ago
Views:

1 Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas Receive: November 11, 2016 Abstract Four binary iscrimination methos are stuie in the context of HDLSS ata with an asymptotic geometric representation, when the imension increases while the sample sizes of the classes are fixe. We show that the methos Support Vector Machine, Mean Difference, Distance Weighte Discrimination an Maximal Data iling have the same asymptotic behavior as the imension increases. We stuy the consistent, inconsistent an strongly inconsistent cases in terms of angles between the normal vectors of the separating hyperplanes of the methos an the optimal irection for classification. A simulation stuy is one to assess the theoretical results. 1 Introuction The asymptotic geometric representation of multivariate ata as the imension increases while the sample size is fixe, has been stuie by Ahn et al. (2007), Hall et al. (2005) an Qiao et al. (2010). They show conitions uner which the High imension, low sample size (HDLSS) ata ten to lie eterministically at the vertices of a regular simplex, similarly to the multivariate stanar Gaussian ata. This geometric structure is use to analyze the behavior of some statistical methoologies for multivariate ata in the HDLSS setting. In particular, in Ahn et al. (2007), Jung an Marron (2009) an Jung et al. (2012) it is iscusse the behavior of rincipal Component Analysis uner the geometric representation of HDLSS ata. In Hall et al. (2005) it is stuie, in terms of probability of misclassification, the behavior of some Corresponing author. División Acaémica e Ciencias Básicas - UJAT, Carretera Cunuacán-Jalpa KM. 1, Col. La Esmerala, C ay.bolivar@ujat.mx. 1

2 binary iscrimination methos, incluing the methos Support Vector Machine (Cristianini an Shawe- Taylor (2000), Vapnik (1995)), Distance Weighte Discrimination (Marron (2015), Marron et al. (2007)) an Mean Difference (Scholkopf an Smola (2002)), when the ata have this asymptotic geometric representation as the imension increases. In Qiao et al. (2010) a similar analysis is one for the binary iscrimination metho Weighte Distance Weighte Discrimination (wdwd), they also stuy the asymptotic behavior of this metho in terms of the angle between the normal vector of the separating hyperplane an the optimal irection for classification. In Bolivar-Cime an Marron (2013), consiering Gaussian ata with common iagonal covariance matrix, it is shown that the four methos Support Vector Machine (SVM), Distance Weighte Discrimination (DWD), Mean Difference (MD) an Maximal Data iling (MD) (Ahn an Marron (2010)) have the same asymptotic behavior as the imension increases an the sample sizes are fixe, in terms of angles between the normal vectors of the separating hyperplanes of the methos. In the present paper we prove that this result of Bolivar-Cime an Marron (2013) hols for more general HDLSS ata with an asymptotic geometric representation. Note that ue to the asymptotic geometric representation of the ata, if the two classes of the training ata set have the same istribution except that one class has mean v an the other class has mean zero, the vector v is the optimal irection for the normal vector of a separating hyperplane of the ata. We showe that as the imension increases the angles between the normal vectors of the separating hyperplanes of the four methos an v converge to zero in probability when v 1/2, i.e. are consistent; an converge to π/2 in probability when v 1/2, i.e. are strongly inconsistent. In the case where v c 1/2 with 0 < c <, we showe that the angles converge to a number in the interval (0, π/2) in probability as the imension increases, i.e. the four methos are inconsistent. We provie some examples of HDLSS ata for which our results are vali. The results of the present paper complement the results of Hall et al. (2005) about the behavior of some binary classification methos. Furthermore, our results exten the results of Ahn et al. (2007) an Qiao et al. (2010), since in Ahn et al. (2007) the metho MD is not consiere, in Qiao et al. (2010) only the methos DWD an wdwd are consiere. Our results also provie a theoretical explanation of the phenomena observe in the simulations stuies of Ahn an Marron (2010) an Marron et al. (2007), in terms of means of misclassification rates. Aitionally, we compare the asymptotic behavior of the angles between the normal vectors of the four methos an the optimal irection with the asymptotic behavior of the probabilities of misclassification. We also present a simulation stuy to numerically assess our theoretical results. As is mentione in Hall et al. (2005), when N, where N is the sample size of the ata, an no k ata points lie in a (k 2)-imensional hyperplane (which happens with probability one for ata with continuous 2

3 probability ensities), the training ata set is linearly separable. In this paper we restrict our attention to the linearly separable case, assuming that the HDLSS ata set treate here is the linearly separable with probability one. This paper is ivie as follows. In Section 2 we present the geometric representation of some HDLSS ata. Our theoretical results about the asymptotic behavior of the normal vectors of the separating hyperplanes of the four methos are presente in Section 3. In Section 4 we present the asymptotic behavior of the probabilities of misclassification of the four methos. In Section 5 we provie a simulation stuy to evaluate our results. The technical etails of the paper are presente in Section 6. Finally, we provie conclusions in Section 7. 2 Geometric representation of high imensional ata The geometric representation of high imensional ata concerns the geometric structure that multivariate ata have as the imension tens to infinity while the sample size is fixe. This geometric structure can be foun for example in multivariate stanar Gaussian ata as the imension tens to infinity. 2.1 Stanar Gaussian geometrical representation As it is mentione in Hall et al. (2005), if Z has -multivariate stanar Gaussian istribution, as tens to infinity we have that Z = 1/2 + O p (1). (1) This means that as the imension increases the ranom vector Z tens to lie near the surface of an expaning sphere. Furthermore, if Z 1 an Z 2 are two inepenent -multivariate stanar Gaussian vectors, as increases we have Z 1 Z 2 = (2) 1/2 + O p (1). (2) Thus, the istance between the ata vectors is approximately constant as the imension increases. It is also true that for these vectors, as the imension increases we have Angle(Z 1, Z 2 ) = π 2 + O p( 1/2 ). (3) That is, the angle between the vectors tens to be an orthogonal angle as the imension tens to infinity. 3

4 In general, if we have n of these inepenent -multivariate stanar Gaussian vectors, as the imension increases, all pairwise istances are approximately equal an all pairwise angles are approximately perpenicular. Because all pairwise istances are nearly the same, the n vectors ten to be the vertices of a regular n-polyheron, that is a polyheron with n vertices an with eges of the same length. This n-polyheron is calle an n-simplex. 2.2 General geometrical representation In Ahn et al. (2007), Hall et al. (2005) an Qiao et al. (2010) it is shown that the approximate n-simplex structure of the stanar Gaussian ata can be observe for more general ata, as it is presente below. Let X() = (X (1), X (2),..., X () ) be the vector obtaining by truncating an infinite time series, which is written as the vector X = (X (1), X (2),... ). Let X () = {X 1 (), X 2 (),..., X n ()} be a ranom sample of inepenent an ientically istribute ranom vectors with the same istribution as X(). Assume the following: (a) The fourth moments of the entries of the ata vectors are uniformly boune. (b) For a constant σ 2, 1 var(x (k) ) σ 2 as. (4) k=1 (c) The time series X is ρ-mixing for functions that are ominate by quaratics, in the sense that whenever functions f an g of two variables satisfy f(u, v) + g(u, v) Cu 2 v 2 for fixe C > 0 an all u an v, we have sup corr[f(u (k), V (k) ), g(u (l), V (l) )] ρ(r), (5) 1 k,l<, k l r with (U, V ) = (X, X), (X, X ), where X is inepenent an has the same istribution as X, an the function ρ satisfies ρ(r) 0 as r. Uner the conitions (a), (b) an (c), in Hall et al. (2005) it is shown that the istance between X i () an X j (), for i j, is approximately (2σ 2 ) 1/2 when is large, in the sense that X i X j 2 2σ 2 as, (6) 4

5 where means convergence in probability. It is also true that X i 2 σ 2 as. (7) Therefore, the asymptotic n-simplex structure of the stanar Gaussian ata is also observe for these ata. As it is mentione in Ahn et al. (2007), conition (c) states that the variables have to be nearly inepenent, since they must satisfy a ρ-mixing conition. However, this conition is too strict because is common to have strong collinearity among variables. Furthermore, this conition epens on the orer of the ata entries, which can be arbitrary in many applications. In Ahn et al. (2007) an Qiao et al. (2010) it is shown that the asymptotic n-simplex structure of high imensional ata can be observe uner mil conitions, which are presente below. (X (1) Let X = [X 1, X 2,..., X n ] be a n ata matrix with > n, where the ranom vectors X i = i, X (2) i,..., X () i ), i = 1, 2,..., n, are inepenent an ientically istribute from a -imensional multivariate istribution with mean zero an nonnegative efinite covariance matrix Σ. the eigenvalue ecomposition of Σ is Σ = V Λ V, where Λ Suppose that is the iagonal matrix of eigenvalues λ 1, λ 2, λ, 0 an V is the matrix of corresponing eigenvectors. If Σ is positive efinite (an all its eigenvalues are positive) efine Z = Λ 1/2 V X, which is a n ranom ata matrix from a istribution with ientity covariance matrix. Observe that, if the columns of X are Gaussian, the elements of Z are inepenent stanar Gaussian univariate variables. The sample covariance matrix is given by S = n 1 X X, because the population mean is the zero vector. The ual sample covariance matrix is efine as S D, = n 1 X X, which is an n n matrix. It is important to note that S D, has the same nonzero eigenvalues as S. Using the fact that V V is the ientity, we have ns D, = Z Λ Z = λ i, W i,, (8) i=1 where W i, = Z i, Z i, an Z i,, for i = 1, 2,...,, are the row vectors of Z. Note that if X is Gaussian, W i,, for i = 1, 2,...,, are inepenent matrices from the Wishart istribution W n (1, I n ). Assume the following for the matrix X : (a ) The fourth moments of the variables are uniformly boune. (b ) The representation in (8) hols. 5

6 (c ) The eigenvalues of Σ are sufficiently iffuse, in the sense that i=1 λ2 i, ( i=1 λ 0 as. (9) 2 i,) ( ) The entries of Z are inepenent. In Ahn et al. (2007) an Qiao et al. (2010) it is shown that uner the conitions (a ) ( ), the square istance between X i an X j, for i j, is approximately 2 i=1 λ i, when is large, in the sense that X i X j 2 i=1 λ i, 2 as. (10) It is also true that X i 2 i=1 λ i, 1 as. (11) Therefore the ata ten to form an n-simplex as imension increases. It is also shown in Ahn et al. (2007) that the conition (c ) is miler than the ρ-mixing conition (c). In Jung an Marron (2009) it is shown that conition ( ) can be relaxe assuming that the entries of Z are ρ-mixing uner some permutation, however this last conition is still very strict. By the results of Yata an Aoshima (2012) we have (10) an (11) uner the conitions (a ) (c ) an the new conition s,t=1 λ s,λ t, E{(Z 2 1(s) 1)(Z2 1(t) 1)} tr(σ ) 2 0 as, (12) where Z 1(s) is the first element of Z s,. In Yata an Aoshima (2012) it is mentione that (12) is miler than ( ) (or the ρ-mixing conition for the entries of Z ), since (12) hols uner (c ) an ( ) (or the ρ-mixing conition for the entries of Z ). 3 Asymptotic behavior of the normal vectors In this section we present a generalization of the Theorem 3.1 of Bolivar-Cime an Marron (2013). That theorem states that the asymptotic behavior of the four binary iscrimination methos SVM, DWD, MD an MD is the same as the imension increases, when the two classes C + an C are Gaussian with means v an zero, respectively, an common iagonal covariance matrix Σ = iag(σ 2 1,..., σ 2 ), where {σ k} k=1 is a boune sequence of positive numbers such that k=1 σ2 k / σ2 as, for some σ > 0. It can be 6

7 seen that these ata have the asymptotic geometric representation of Section 2.2, an since the ifference between the two classes is etermine by the mean vector v, the optimal irection for the normal vector of a separating hyperplane of these ata is v. Specifically, the Theorem 3.1 of Bolivar-Cime an Marron (2013) states that, uner the above assumptions, when v 1/2 the angles between the normal vectors of the separating hyperplanes of the four methos an the optimal irection v converge to zero in probability as, i.e. are consistent; when v 1/2 0 these angles converge to π/2 in probability, i.e. are strongly inconsistent; an when v 1/2 c with 0 < c <, these angles converge to arccos(c/(γσ 2 + c 2 ) 1/2 ), where γ = 1 m + 1 n with m an n the sample sizes of C + an C, respectively, i.e. are inconsistent. Our next theorem claims that when we consier multivariate ata with an asymptotic geometric representation, similar to that of the multivariate stanar Gaussian ata or the multivariate ata of Section 2.2, the result of Theorem 3.1 of Bolivar-Cime an Marron (2013) still hols uner some conitions. Theorem 3.1 Let m, n be positive integers an let N = m + n. Let Z 1, Z 2,..., Z N be inepenent an ientically istribute -imensional ranom vectors, with mean zero an covariance matrix Σ. Let C + be the class of the ranom vectors X i = Z i + v, for i = 1, 2,..., m, an let C be the class of the ranom vectors Y j = Z m+j, j = 1, 2,..., n, where v 1/2 c, with 0 c. Assume the following: (i) The ranom vectors have the asymptotic geometric representation Z i 2 σ 2 an Z i Z j 2 2σ 2 (13) as for some σ > 0, for all i, j = 1, 2,..., N an i j. (ii) The covariance matrix Σ = (σ i,j ) an the vector v = (v (1) D Σ (v, 0) 2 v 2 =, v(2),..., v() ) satisfy k,r=1 σ k,rv (k) v(r) v 2 0 as, (14) where D Σ (x, y) = [(x y) Σ (x y)] 1/2, x, y R, is the Mahalanobis istance corresponing to Σ. Uner these conitions if v represents the normal vector of the MD, SVM, DWD or MD hyperplane of the 7

8 training ata set, then Angle(v, v ) 0, if c = ; π, 2 ( if c = 0; arccos c (γσ 2 + c 2 ) 1/2 ), if 0 < c < ; as, where γ = 1 m + 1 n. As in Bolivar-Cime an Marron (2013), we observe in our Theorem 3.1 that the asymptotic behavior of the normal vectors of the four methos is relate with the istance between the two classes, in particular with v. It is observe that when v 1/2 (c = ) the classification is easier than when v 1/2 (c = 0). This is explaine by the geometric representation of the ata sets, since the ata ten to lie at a istance σ 1/2 from the mean when is large. To illustrate the role of c in the last theorem, we present in the Figure 1 the intuitive iea of the asymptotic behavior of the ata, with σ = 1 an 0 < c <. By the asymptotic geometric representation of the ata, when imension is large the ata of the class C will be aroun the sphere of raio 1/2 with center in the origin, while the ata of the class C + will be aroun the sphere of raio 1/2 an center v. We also have that v c 1/2. Therefore, as c approximates to infinity the two spheres are far away an the classification with the four methos is easier, however as c approximates to zero the two spheres are very close an the classification is more ifficult. Figure 1: Asymptotic behavior of the ata, with σ = 1 an 0 < c <. The classification is easier when the two spheres are far away (c ) an it is more ifficult when the two spheres are very close (c 0). Observe that conition (14) is in terms of a Mahalanobis istance between the two class means. Furthermore, since Σ = Σ 1/2 Σ 1/2, where Σ 1/2 = V Λ 1/2 V, with Λ the iagonal matrix of eigenvalues of Σ an V the orthogonal matrix of corresponing eigenvectors, we have that D Σ (x, y) = Σ 1/2 x Σ 1/2 y, i.e. 8

9 D Σ (x, y) is the eucliean istance between the vectors obtaine from x an y by using the linear transformation Σ 1/2. Thus, D Σ (v, 0) = Σ 1/2 v is the eucliean istance between that linear transformation of the class means. Hence, (14) is equivalent to Σ 1/2 v 2 /( v 2 ) 0 as. We also have that conition (14) is equivalent to D Σ (v, 0)/ v = o( 1/2 ) as, which is satisfie in particular if the ratio D Σ (v, 0)/ v is boune. Therefore, conition (14) tries to control the magnitue of the Mahalanobis istance with respect to the eucliean istance between the two class means. For example, if v = (c 1/2, 0,..., 0) with 0 < c <, an Σ = 2I /2 0 (15) 0 I /2 with even, then D Σ (v, 0)/ v = 2 1/2 an conition (14) is satisfie. For this example we observe that the Mahalanobis istance between the two class means is 2 1/2 times the eucliean istance between them. Remark 3.1 The conition (i) of the last theorem is satisfie if conitions (a) (c) of Section 2.2 hol. If λ 1, λ, are the eigenvalues of Σ an i=1 λ i,/ σ 2 as, for some σ > 0, then conition (i) is also satisfie uner the conitions (a ) ( ) of Section 2.2 or the conitions (a ) (c ) an (12), because of (10) an (11). Remark 3.2 There are several cases where the conition (ii) is satisfie, some of them are the following (see Section 6 for etails): (I) The covariance matrix Σ is a iagonal matrix with entries uniformly boune. (II) The vector v has a fixe number of nonzero entries, an the secon moments of the entries of Z 1 are uniformly boune. (III) The entries of v are uniformly boune, v 1/2 c with 0 < c <, an one of the following conitions is satisfie: a) the entries of Z 1 have secon moments uniformly boune, an are ρ-mixing in the sense that sup E(Z (k) k l r 1 Z(l) 1 ) = sup σ k,l = ρ(r) 0 as r. k l r b) the entries of Z 1 have secon moments uniformly boune, an Σ has a fixe number of nonzero upper iagonals. 9

10 c) the eigenvalues of Σ, λ 1, λ 2, λ, 0, satisfy k=1 λ2 k, 2 0 as. (16) (IV) The vector v has the form v = β1, with β = β c as, where 0 c, an one of the conitions a), b) or c) of (III) is satisfie. By Remark 3.1, the multivariate Gaussian ata of Theorem 3.1 of Bolivar-Cime an Marron (2013) satisfy the conition (i) of Theorem 3.1. These ata also satisfy the conition (I) of Remark 3.2, therefore the conition (ii) of Theorem 3.1 is satisfie as well. In this sense Theorem 3.1 generalizes Theorem 3.1 of Bolivar-Cime an Marron (2013), by extening this result to more general multivariate ata with an asymptotic geometric representation. Theorem 3.1 also provie a theoretical explanation of the simulation results presente in Ahn an Marron (2010) an Marron et al. (2007), where it is observe that some of the consiere binary iscrimination methos have approximately the same behavior, in terms of means of error rates (misclassification rates), as the imension increases. For the proof of Theorem 3.1 we nee the next lemma, which is also a generalization of a result in Bolivar-Cime an Marron (2013). As it is explaine in Bolivar-Cime an Marron (2013), the normal vectors of the MD, SVM an DWD hyperplanes are proportional to the ifference between two points on the convex hulls of the two classes. The next lemma provies an explicit asymptotic representation for these ifferences, when 0 c < (the inconsistent cases). We enote by α = (α+, α ) an N-imensional vector, where α + = (α 1+, α 2+,..., α m+ ) an α = (α 1, α 2,..., α n ) are subvectors of α of imensions m an n, respectively. Lemma 3.1 Assume the same as in Theorem 3.1. Suppose that v 1/2 c with 0 c <. Let X = [X 1, X 2,..., X m ] an Y = [Y 1, Y 2..., Y n ]. If the vector ṽ = Xα + Yα, with α 0 an 1 mα + = 1 n α = 1, is proportional to the normal vector of the MD, SVM or DWD hyperplane we have that α i+ 1 m, α j 1 n, (17) as, for i = 1, 2,..., m an j = 1, 2,..., n. Thus, in the inconsistent cases the normal vectors of SVM an DWD are approximately in the same irection as the normal vector of MD when is large. This is also true for MD, as we will see in the proof of Theorem 3.1. Due to the asymptotic geometric representation of the ata (see Figure 1), in Theorem 3.1 we have that as c the angles between the normal vectors of the four methos an v ten to zero as 10

11 increases, that is, in this case the irection of the four methos is approximately the irection of v when is large. 4 Asymptotic properties of the probabilities of misclassification Assume the same as in Theorem 3.1. Suppose v / 1/2 c, with 0 < c <. Due to the asymptotic geometric representation of the ata, by Hall et al. (2005) an Qiao an Zhang (2015) we have the following two results about the asymptotic error rates of the SVM, MD an DWD hyperplanes. Theorem 4.1 Assume that n m; if nee be, interchange X an Y to achieve this. If c > σ(1/m 1/n) 1/2, then the probability that a new atum from either the X-population or the Y -population is correctly classifie by the SVM or the MD hyperplane converges to 1 as. If c < σ(1/m 1/n) 1/2, then with probability converging to 1 as a new atum from either population will be classifie by the SVM or the MD hyperplane as belonging to the Y -population. Theorem 4.2 Assume that n m; if nee be, interchange X an Y to achieve this. If c > σ[(n/m) 1/2 /m 1/n] 1/2, then the probability that a new atum from either the X-population or the Y -population is correctly classifie by the DWD hyperplane converges to 1 as. If c < σ[(n/m) 1/2 /m 1/n] 1/2, then with probability converging to 1 as a new atum from either population will be classifie by the DWD hyperplane as belonging to the Y -population. As we mentione before, uner the hypotheses of Theorem 3.1 an if 0 < c <, the normal vector of MD is approximately in the same irection as the normal vector of MD when is large. Therefore, if we take the intercept of the MD hyperplane as b = v (X + Y )/2, where X an Y are the class means of the X an Y populations, respectively, an v is the normal vector of MD, then the MD hyperplane coincies with the MD hyperplane when tens to infinity. Thus, Theorem 4.1 hols for the MD metho. By the above results, if m = n the four methos give asymptotically correct classification of a new atum from any population, for all 0 < c <. In the case where the sample sizes m an n are unequal, for example if n > m, efine M 1 = σ(1/m 1/n) 1/2, M 2 = σ[(n/m) 1/2 /m 1/n] 1/2, an note that M 2 > M 1. By the last theorems, if c > M 2 the four methos give asymptotically correct classification of a new atum from any population; if M 2 > c > M 1 then SVM, MD an MD give asymptotically correct classification of a new atum from any population, while DWD gives asymptotically perfect classification for the Y -population an asymptotically completely incorrect classification for the 11

12 X-population. This shows an asymptotic avantage of SVM, MD an MD over DWD, in the sense of classifying correctly new ata from any population for a wier range of values of c as tens to infinity. Observe that if n m an c > M 2, by the last results the four methos have the consistent property of the error rates in the sense that their error rates ten to zero as tens to infinity, an by Theorem 3.1 the four methos have the inconsistent property of the normal vectors in the sense that the angles of their normal vectors an the optimal irection o not ten to zero as tens to infinity. That is, in this case the asymptotic geometric representation of the ata allows to fin separating hyperplanes that give perfect classification even when their normal vectors are not in the same irection as the optimal irection. By Theorem 3.1 we have that when the limit of the angles between the normal vectors of the four methos an the optimal irection approaches to zero as c tens to infinity, however it is sufficient to have c > M 2 in orer to have asymptotically correct classification of a new atum from any population with the four methos, an in this situation the limit of the angles between the normal vectors of the four methos ( ) M 2 an the optimal irection is at most arccos (γσ 2 + M2 2. )1/2 In the case when n an m are unequal, the above results also show that the classification is easier when c than when c 0. In the case when n = m this is also true, since as we will see in the simulation stuy of Section 5, even when the four methos give asymptotically correct classification of a new atum from any population for all 0 < c <, as c increases the convergence of the error rates to zero as tens to infinity is faster. Aoshima an Yata (2014) an Nakayama et al. (2017) propose a bias-correcte MD an a bias-correcte SVM, respectively, to improve the performance of the error rates of MD an SVM. Theorem 3.1 also hols for the bias-correcte classifiers, since it is only about the normal vectors of the separating hyperplanes. Assume the hypotheses of Theorem 3.1. Consier the bias-correcte MD propose by Aoshima an Yata (2014), name the istance-base classifier, which is efine as follows: One classifies an iniviual X 0 into C + if W (X 0 ) < 0 an into C otherwise, where W (X 0 ) = (X 0 (X + Y )/2) (Y X) trs + /(2m) + trs /(2n), S + an S is the sample covariance matrix for C + an C, respectively. Here, trs + /(2m) + trs /(2n) is a bias-correction term. This classifier is equivalent to the scale ajuste istance-base classifier given by Chan an Hall (2009). From Theorem 1 of Aoshima an Yata (2014), the error rates of W (X 0 ) ten to zero as if D Σ (v, 0) 2 v 4 0 an tr(σ 2 ) 0, as. (18) min(m, n) v 4 As a referee pointe out, if (I) in Remark 3.2 hols, it hols that tr(σ 2 ) = O() an D Σ (v, 0) 2 = O( v 2 ). Therefore, if v 2 / c 2 > 0 then (18) hols an the error rates of the classifier W (X 0 ) ten to zero 12

13 as, even when the angle between the normal vector of the separating hyperplane an the optimal irection o not ten to zero. Observe that (18) hols even when v = δ with δ (1/4, 1/2), which correspon to the strong inconsistent case of Theorem 3.1 since v / 1/2 0. That is, in some cases, the classifier W (X 0 ) can have the consistency property of the error rates even when the normal vector is strong inconsistent with the optimal irection. This shows the goo properties of the bias-correcte classifiers. 5 Simulation stuy In this section we present a simulation stuy to illustrate numerically the theoretical results presente previously. In Bolivar-Cime an Marron (2013) it is presente a simulation stuy consiering Gaussian ata with ientity covariance matrix. In the simulations of the present paper we take more general multivariate ata with an asymptotic geometric representation, to illustrate the asymptotic behavior of the four consiere binary iscrimination methos as the imension tens to infinity. To compare our results with those of Bolivar-Cime an Marron (2013), we take the same mean vectors v consiere in that paper. We take v = ( δ, 0,..., 0) with δ = 0.2, δ = 0.5 an δ = 0.8, which correspon to the cases c = 0, c = 1 an c = of Theorem 3.1, respectively. We also consier v = β1 with β = 0.5, β = 1 an β = 10, which correspon to the cases c = 0.5, c = 1 an c = 10, respectively. We consier sample sizes m = n = 20, thus γ = 1/m + 1/n = 1/10. We take the imensions = 10, 30, 100, 300, 1000, 2000 to consier the non-hdlss an HDLSS settings. The number of the training ata sets generate for each value of is M = 500. In Ahn an Marron (2010) the MD metho is efine when N 1, where N = m + n. They also mentione that a formula for the normal vector of MD is equivalent to Fisher s iscriminant vector when N 2, which oes not have the piling property. We can view MD as the HDLSS version of Fisher s linear iscriminant metho, with zero within-class scatter an maximize between-classes scatter. Hence, in the simulation stuy presente here, we take the MD normal vector as the Fisher s iscriminant vector when N 2. Suppose is even. Let Z = (Z (1), Z (2),..., Z () ) be a ranom vector where Z (i), for i = 1, 2,..., /2, are inepenent an ientically istribute ranom variables from the univariate stanar normal istribution, an where Z (j) = Z (i)2 + Z (i) 1, for j = /2 + i with i = 1, 2,..., /2. Note that Z has mean zero an it can be seen that its covariance matrix is given by Σ = I /2 I /2. (19) I /2 3I /2 13

14 Let Z 1, Z 2,..., Z N be inepenent an ientically istribute ranom vectors with the same istribution as Z. In Section 6.4 it is shown that these ata have an asymptotic geometric representation, in particular the ata satisfy conition (i) of Theorem 3.1 with σ 2 = 2. Now we will see that the ata satisfy conition (ii) of Theorem 3.1. It is clear that the secon moments of the entries of the vector Z 1 are uniformly boune, since the istribution of the entries is only of two types (the stanar normal istribution or the istribution of Z (i)2 + Z (i) 1, where Z (i) has stanar normal istribution), an these istributions have finite secon moments which are inepenent of the value of. Observe that if v = ( δ, 0,..., 0) with δ > 0, by (II) of Remark 3.2 we have conition (ii) of Theorem 3.1. On the other han, if v = β1 with β > 0, by (IVb) of Remark 3.2 we also have the conition (ii) of Theorem 3.1, since Σ has a fixe number of nonzero upper iagonals. 5.1 Behavior of the normal vectors of the four methos The means of the angles between v an the normal vectors of the separating hyperplanes of the four consiere methos are compute for each value of, in all the consiere settings. In Figure 2 we show the means of the angles between v an the normal vectors varying the imension, for the case when v = ( δ, 0,..., 0) with δ = 0.2, δ = 0.5 an δ = 0.8. We observe that when δ = 0.2 the means of the angles between the optimal vector v an the normal vectors of the separating hyperplanes of the four methos ten to approximate π/2 = as the imension increases. When δ = 0.8 the means of the angles ten to zero as the imension increases. In this case, when δ = 0.5 the means of the angles ten to arccos(c/(γσ 2 + c 2 ) 1/2 ) = as the imension increases, where c = 1 (γ = 1/10 an σ 2 = 2). These results are accoring to Theorem 3.1. In Figure 3 we show the means of the angles between v an the normal vectors of the four consiere methos varying the imension, when v = β1 with β = 0.5, 1, 10. It is observe that when β is equal to 0.5, 1 an 10 the means of the angles approximate , an as the imension increases, respectively, which are the values of arccos(c/(γσ 2 +c 2 ) 1/2 ) with c equal to 0.5, 1 an 10, respectively. Again, these results are accoring to Theorem 3.1. For these ata we observe that although the four methos have the same asymptotic behavior as the imension tens to infinity, in terms of angles between the normal vectors an the optimal irection, the two best methos are MD an DWD, being MD the metho with the best behavior in most of the cases. Note that in the case where all the methos are consistent, MD is the metho that converge faster to the optimal irection, this is because the asymptotic optimal irection for the normal vector of the separating hyperplane is the ifference between the two class means. In the HDLSS situation DWD some times coincie 14

15 Figure 2: Means of the angles between v an the normal vectors of the separating hyperplanes of the four methos, consiering v = ( δ, 0,..., 0) with δ = 0.2, 0.5, 0.8. with MD. The thir best metho is SVM an the worse metho is MD in almost all the consiere cases. It is also observe that MD has its worse behavior when N = 40, which has been previously note in the simulations of Bolivar-Cime an Marron (2013), an in the simulations of Ahn an Marron (2010) for Gaussian ata in terms of means of misclassification rates. 5.2 Behavior of the error rates of the four methos In orer to compare the error rates of the four methos we take v = β1 with β = 0.5, 1, 10. Classification error rates were compute taking 100 new ata points from each of the two classes. In Figure 4 we show the means of the error rates of the four consiere methos for the cases β = 0.5 an β = 1, which correspon to the cases c = 0.5 an c = 1, respectively. In this figure we observe the convergence to zero of the means of the error rates of the four methos as, even when in Figure 3 we observe that the means of the angles of the normal vectors of the separating hyperplanes an the optimal irection o not converge to zero as. For the case β = 10, corresponing to the case c = 10, the means of the error rates of the four methos an for all the consiere values of were practically zero, therefore we o not inclue the graphs of the error rates for this case. In Figure 4 we observe that generally the error rates of the methos 15

16 Figure 3: Means of the angles between v an the normal vectors of the separating hyperplanes of the four methos, consiering v = β1 with β = 0.5, 1, 10. DWD an MD are the smallest, an in the HDLSS situation these error rates practically coincie. The thir best metho in terms of error rates is SVM, an the worse metho is MD. Note that similar conclusions were obtaine in Section 5.1 in terms of angles between the normal vectors of the methos an the optimal irection, however DWD sometimes have smaller error rates than MD, an MD generally have smaller angles than DWD. We also observe that as c increases the convergence of the error rates to zero is faster. In Hall et al. (2005) it is stuie by simulations the behavior of the error rates of the methos MD, SVM an DWD when n an m are unequal. They observe in their simulations that when tens to infinity the error rates of MD an SVM practically coincie, an DWD es substantially worse than these methos, consiering several values of c > 0. We i similar simulations but now incluing the MD metho, not showing here to save space, an we observe that when tens to infinity the error rates of MD practically coincie with error rates of MD an SVM. This was expecte, since for 0 < c < the hyperplanes of these three methos coincie as tens to infinity ue to the asymptotic geometric representation of the ata, an by the results of Section 4 these three methos behave better than DWD in terms of error rates. 16

17 Figure 4: Means of the error rates of the four methos, consiering v = β1 with β = 0.5, 1. 6 Technical etails The main ieas for the proofs of our results are similar to that of Bolivar-Cime an Marron (2013), however we use the Tchebysheff an the Cauchy-Schwartz inequalities. First we present some consequences of the hypothesis of Theorem 3.1 that will be very useful along this section. Let x, y = k=1 x(k) y (k) be the inner prouct of the vectors x, y R. By conition (i) we have Z i, Z j = 1 2 ( Zi 2 + Z j 2 Z i Z j 2 ) 0, for i j, (20) as. Furthermore, for any i = 1, 2,..., N, by conition (ii) an the Tchebysheff inequality, for all τ > 0 ( ) Zi, v 1/2 v > τ E( Z i, v 2 ) τ 2 v 2 = k,l=1 E(Z(k) i Z (l) i )v (k) v(l) τ 2 v 2 = 1 τ 2 k,l=1 σ k,lv (k) v(l) v 2 0 as. Thus for all i = 1, 2,..., N Z i, v 0 as. (21) 1/2 v Note that if 0 c < then (21) implies Z i, v = Z i, v v 0 c = 0 as. (22) 1/2 v 1/2 17

18 6.1 roof of Lemma 3.1 Observe the following X i Y j 2 = Z i + v Y j 2 = Z i 2 + Y j 2 + v 2 2 Z i, Y j + 2 Z i, v 2 Y j, v. Therefore, by (13), (20) an (22) it follows that X i Y j 2 2σ 2 + c 2 as. (23) We also have by (13) that for i j X i X j 2 2σ 2, Y i Y j 2 2σ 2 as. (24) Thus (23) an (24) imply that the vectors X 1, X 2,..., X m, Y 1, Y 2,..., Y n ten to be the vertices of an N- polyheron as. The rest of the proof is base on similar arguments as in the proof of Lemma 3.1 of Bolivar-Cime an Marron (2013), which are presente below. The asymptotic N-polyheron has m of its vertices arrange as those of an m-simplex an the other n vertices arrange in an n-simplex. After rescaling by 1/2, when tens to infinity the ata in C + an C ten to be the vertices of an m-simplex an an n-simplex, respectively, with eges of length 2 1/2 σ. Let X 1,..., X m be the vertices of the m-simplex an let Y 1,..., Y n be the vertices of the n-simplex. Let ṽ = Xα + Yα, with α 0 an 1 mα + = 1 n α = 1, be proportional to the normal vector of the MD, SVM or DWD hyperplane. For the two classes of the N-polyheron, it is shown in the proof of Lemma 3.1 of Bolivar-Cime an Marron (2013) that this α is given by α i+ = 1 m, α j = 1, i = 1, 2,..., m, j = 1, 2,..., n, n for these three methos. Therefore we have (17). 18

19 6.2 roof of Theorem When v is the normal vector of the MD, SVM or DWD hyperplane Case 1: c =. Let ṽ = m i=1 α i+x i n i=1 α i Y i be proportional to the vector v, with α > 0 an 1 mα + = 1 n α = 1, as in Lemma 3.1. We have ṽ, v = m α i+ Z i, v i=1 Note that by the Cauchy-Schwartz inequality Z i, v Z i v, therefore n α i Y i, v + v 2. (25) i=1 Z i, v v 2 Z i 1/2 σ 0 = 0 (26) 1/2 v as 0, for i = 1, 2,..., N. Since 0 α i+, α i 1, it follows that m i=1 α i+ Z i, v v 2 n 0, i=1 α i Y i, v 0 as. (27) v 2 Thus, by (25) we have ṽ, v v 2 1 as. (28) We also have ṽ 2 = ( m α i+ Z (k) i i=1 i=1 n i=1 α i Y (k) i The first term of the last equation is equal to m ) 2 ( m + 2 α i+ Z i, v α i+ α j+ Z i, Z j + i=1 n ) n α i Y i, v + v 2 (29) αi+ 2 Z i 2 +2 αi 2 Y i 2 +2 i=1 i<j i=1 i<j m n 2 α i+ α j Z i, Y j. (30) i=1 j=1 Using that 0 α i+ 1 an the Cauchy-Schwartz inequality we have i=1 α i α j Y i, Y j αi+ 2 Z i 2 v 2 Z i 2 α i+ α j+ Z i, Z j v 2 v 2 Z i 1/2 σ 2 0 = 0 Z j 1/2 v 2 i, σ 2 0 = 0 for i j, 19

20 as. Thus, if we ivie the first two terms of (30) by v 2 they converge to zero in probability as. Analogously, iviing the rest of the terms of (30) by v 2 they converge to zero in probability as. Note that if we ivie the secon term in the right-han sie of (29) by v 2, ue to (27) this term also converges to zero in probability as. Thus ṽ 2 v 2 1 as. (31) By (28) an (31) we have ṽ, v ṽ v = ṽ, v / v 2 ṽ / v 1 (32) as. Therefore ( ) ṽ, v Angle(ṽ, v ) = arccos 0 as. ṽ v Note that for this case the conition (ii) is not necessary. Case 2: 0 c <. Let ṽ be as before. By Lemma 3.1 an (13) it follows that m i=1 αi+ 2 Z i 2 m i=1 1 m 2 σ2 = σ2 m, m i=1 αi 2 Y i 2 n i=1 1 n 2 σ2 = σ2 n, (33) as. We have that ṽ 2 is given by (29) an the first term of (29) is given by (30). Now, by Lemma 3.1, (20) an (33) we have that (30) ivie by converges in probability to γσ 2, with γ = 1 m + 1 n, as. By Lemma 3.1 an (22) the secon term of (29) ivie by converges in probability to zero as. Thus, since v 2 / c 2 we have that ṽ 2 γσ 2 + c 2 as. (34) Diviing both sies of (25) by 1/2 v, Lemma 3.1 an (21) imply ṽ, v c as. (35) 1/2 v Now, by (34) an (35) we have ṽ, v ṽ v = ṽ, v /( 1/2 v ) ṽ / 1/2 c (γσ 2 + c 2 ) 1/2 (36) 20

21 as. Therefore ( ) ṽ, v Angle(ṽ, v ) = arccos ṽ v ( arccos c (γσ 2 + c 2 ) 1/2 ) (37) as. Note that if c = 0 then arccos(c/(γσ 2 + c 2 ) 1/2 ) = π/ When v is the normal vector of the MD hyperplane Let X an Y be the mean vectors of the classes C + an C, respectively. Let u = X Y be the MD normal vector. First, we will show that for i = 1, 2,..., m an j = 1, 2,..., n we have Angle(X i X, u) π 2, Angle(Y j Y, u) π 2 as. (38) Observe that X i X 2 = (1 1/m)Z i (1/m) Z j 2 j i ( = 1 m) 1 2 ( Z i ) ( ) 1 Z i, Z j + 1 k=1( m m m 2 Z (k) j ) 2. j i j i By (13) the first term of the last expression ivie by converges in probability to (1 1/m) 2 σ 2 as. By (20) the secon term ivie by converges in probability to zero as. Observe that the thir term is equal to 1 m 2 j i Z i 2 +2 r,s i, r<s Z r, Z s. Therefore, (13) an (20) imply that the thir term ivie by converges in probability to (m 1)σ 2 /m 2 as. Thus we have that X i X 2 ( 1 1 ) 2 σ 2 + m 1 m m 2 σ 2 = m 1 m σ2 (39) as. If Z is the mean vector of Z 1, Z 2,..., Z m, then from (13) an (20) it follows that Z 2 σ2 m as. (40) 21

22 Observe that Xi X, u = Z i Z, Z + v Y = 1 m Z i Z i, Z j + Z i, v Z i, Y Z 2 Z, v + Z, Y m j i By the last equation, if c =, from (13), (20), (21) an (40) it follows that Xi X, u 1/2 v 0 as. (41) Therefore, by (39), (41) an since u / v 1 as ue to (31), we have that Xi X, u cos[angle(x i X, u)] = X i X u = Xi X, u /( 1/2 v ) 0 (42) ( X i X / 1/2 )( u / v ) as. Similarly, cos[angle(y j Y, u)] 0 as. Thus we have (38). Analogously, but now using (22) an (34), it can be shown that (38) is also true for 0 c <. The rest of the proof is base on similar arguments as in the proof of Theorem 3.1 of Bolivar-Cime an Marron (2013), which are presente below. Define C as the matrix whose columns are the vectors X i X, Y j Y, for i = 1, 2,..., m an j = 1, 2,..., n. By Ahn an Marron (2010), the normal vector of the MD metho is given by v = Qu/ Qu where Q is the symmetric projection matrix on the orthogonal complement of the column space of C. By (38), u tens to be in the orthogonal complement of the column space of C. Therefore when is large Qu can be approximate by u, an v can be approximate by u/ u. Thus cos(angle(v, v )) = v, v /( v v ) can be approximate by u, v u v. (43) Hence, as it was shown in Section 6.2.1, (43) converges to 1 in probability if c =, an it converges to c/(γσ 2 + c 2 ) 1/2 if 0 c <. 6.3 Assumptions that imply conition (ii) of Theorem 3.1 Now it is shown that each conition (I), (II), (III) an (IV) implies the conition (ii) of Theorem

23 (I) In this case v(r) v 2 = k,r=1 σ k,rv (k) k=1 σ k,kv (k)2 v 2 M 0 as, where M is a boun of the entries of Σ. (II) We have that k,r=1 σ k,rv (k) v 2 v(r) k=1 σ k,kv (k)2 v 2 RM + 2 k<r (R 1)RM + 0 σ k,r v (k) v(r) v 2 as, where R is the number of nonzero entries of v, an M is a boun of the secon moments of the entries of Z 1. In the last inequality it is use that σ k,r (σ k,k σ r,r ) 1/2 M, an that v (k) v for all k. (III) Suppose that the entries of v are uniformly boune by R, then v(r) v 2 R 2 v 2 k,l=1 σ k,lv (k) k,l=1 σ k,l 2. (44) Since / v 2 c 2 as, if k,l=1 σ k,l 2 0 as (45) then the right-han sie of (44) tens to zero as. Now it is showe that (45) hols uner conitions a), b) or c). a) If M is a boun of the secon moments of the entries of Z 1 then k,l=1 σ k,l k=1 2 = σ k,k 0< k l <r 2 + σ k,l k l r 2 + σ k,l 2 M + (2 r)(r 1)M 2 + ( r)( r + 1)ρ(r) 2, (46) since σ k,r (σ k,k σ r,r ) 1/2 M. For any ɛ > 0, taking r large enough an later taking sufficiently large, the right-han sie of (46) is less that ɛ. Therefore, since ɛ > 0 is arbitrary it follows (45). b) If r is the number of nonzero upper iagonals of Σ, an M is a boun of the secon moments of 23

24 the entries of Z 1, then k,l=1 σ k,l 2 k=1 σ k,k k<l σ k,l 2 M + 2r( 1)M 2 (47) in the last inequality it is use that each upper iagonal has at most ( 1) nonzero elements, an since there are r nonzero upper iagonals, there are at most r( 1) nonzero elements of Σ in the upper iagonals, an σ k,r (σ k,k σ r,r ) 1/2 M. Therefore, since the right-han sie of the last inequality of (47) tens to zero as it follows (45). c) Note the following k=1 λ2 k, 2 = k,l=1 [E(Z(k) i Z (l) i )] 2 k,l=1 2 = σ2 k,l 2. (48) Furthermore, if x p = ( n k=1 x(k) p ) 1/p is the p norm in R n, since x 1 n x 2 we have that in R 2 then σ k,l k,l=1 k,l=1 k,l=1 σ k,l 2 σ 2 k,l ( k,l=1 σ2 k,l 2 1/2, ) 1/2. Therefore, by (16) an (48) the right-han sie of the last inequality tens to zero an we have (45). (IV) In this case v(r) v 2 = k,r=1 σ k,rv (k) k,l=1 σ k,l 2 k,l=1 σ k,l 2. Analogous to the case (III), by the last inequality if (45) hols it follows (14), an (45) follows from a), b) or c). 6.4 Asymptotic geometric representation of the ata in the simulations In this section we show that the multivariate ata consiere in the simulation stuies have an asymptotic geometric structure as the imension increases. Since E(Z (k)2 i ) = 1 an E[(Z (k)2 i + Z (k) i 1) 2 ] = 3, for i = 1, 2,..., N an k = 1, 2,..., /2, by the Law 24

25 of Large Numbers (LLN) we have Z i 2 = 1 2 /2 k=1 Z(k)2 i + 1 /2 k=1 (Z(k)2 i /2 2 /2 + Z (k) i 1) = 2 as. Analogously, since E[(Z (k) i k = 1, 2,..., /2, by the LLN we have Z i Z j 2 = 1 2 Z (k) j ) 2 ] = 2 an E[(Z (k)2 i + Z (k) i /2 k=1 (Z(k) i Z (k) j ) /2 k=1 [Z(k)2 i + Z (k) i / = 4 Z (k)2 j 1 (Z (k)2 j /2 Z (k) j ) 2 ] = 6, for i j an + Z (k) j 1)] 2 as. Thus the ata have an asymptotic geometric representation an satisfy the conition (i) of Theorem 3.1 with σ 2 = 2. 7 Conclusions The geometric representation of the HDLSS ata allows to analyze the asymptotic behavior of some binary iscrimination methos. In particular, uner this geometric structure of the ata an some conitions, we showe that Support Vector Machine, Distance Weighte Discrimination, Mean Difference an Maximal ata iling have the same asymptotic behavior, in terms of angles between the normal vectors of the separating hyperplanes an the optimal irection, as the imension increases an the sample sizes are fixe. Our results generalize the results of Bolivar-Cime an Marron (2013), where it is showe that the four methos have the same asymptotic behavior in the Gaussian case where the classes have common iagonal covariance matrix. Comparing the asymptotic behaviour of the angles between the normal vectors of the separating hyperplanes an the optimal irection with the asymptotic behavior of the error rates, we observe that in some cases the geometric representation of the ata allows to have the consistency property of the error rates even when the normal vectors of the separating hyperplanes are inconsistent with the optimal irection. Due to the asymptotic geometric structure of the ata, the classification with these four methos is easy when the istance between the two classes is large an it is more ifficult when it is small, since as the istance between the two classes increases, the angles between the normal vectors an the optimal irection ten to zero an the error rates approach to zero faster when the imension tens to infinity. In the simulation stuy presente here, where the sample sizes of the two classes were the same, the conclusions in terms of angles between the normal vector an the optimal irection, an the conclusions in terms of error rates were similar: Although the four methos have the same asymptotic behavior as the 25

26 imension tens to infinity, generally for large imensions the two methos with the best behavior were MD an DWD, the thir best metho was SVM an the worse was MD. The MD metho ha the best behavior in terms of angles of the normal vectors in most of the cases, this is because the asymptotic optimal irection for the normal vector of the separating hyperplane is the ifference between the two class means. The results in terms of error rates for unequal sample sizes of the classes are totally ifferent, since in this case as the imension increases the methos MD, SVM an MD practically coincie, an the DWD metho is substantially worse than these methos. However, if the istance between the two classes is sufficiently large, the error rates of the four methos ten to zero as the imension tens to infinity. As we have observe, the same asymptotic behavior of the error rates of the four methos is relate with the same asymptotic behavior of the normal vectors of the four methos, given by the results presente here. Acknowlegements The authors are grateful to Arolo erez erez an Victor erez-abreu for their help to improve an early version of this manuscript. They also thank the Eitor rof. N. Balakrishnan an the anonymous referees for their important comments an valuable suggestions, which helpe greatly to improve this work. The authors thank the Universia Juárez Autónoma e Tabasco for the support provie uring the elaboration of this paper. This work was finance by RODE, Grant UJAT-TC-178. References Ahn, J. an Marron, J. S. (2010). The Maximal Data iling Direction for Discrimination. Biometrika, 97(1): Ahn, J., Marron, J. S., Muller, K. M., an Chi, Y. (2007). The High-imension, Low-sample-size Geometric Representation Hols Uner Mil Conitions. Biometrika, 94(3): Aoshima, M. an Yata, K. (2014). A istance-base, misclassification rate ajuste classifier for multiclass, high-imensional ata. Ann. Inst. Stat. Math., 66: Bolivar-Cime, A. an Marron, J. S. (2013). Comparison of binary iscrimination methos for high imension low sample size ata. J. Multivar. Anal., 115: Chan, Y.-B. an Hall,. (2009). Scale ajustments for classifiers in high-imensional, low sample size settings. Biometrika, 96:

27 Cristianini, N. an Shawe-Taylor, J. (2000). An Introuction to Support Vector Machines an other kernelbase learning methos. Cambrige University ress, Cambrige, U.K. Hall,., Marron, J. S., an Neeman, A. (2005). Geometric Representation of High Dimension, Low Sample Size Data. J. R. Statist. Soc. B, 67(3): Jung, S. an Marron, J. S. (2009). CA Consistency in High Dimension, Low Sample Size Context. Ann. Statist., 37(6B): Jung, S., Sen, A., an Marron, J. S. (2012). Bounary behavior in High Dimension, Low Sample Size asymptotics of CA. J. Multivar. Anal., 109: Marron, J. S. (2015). Distance-weighte iscrimination. WIREs Comput. Stat., 7: Marron, J. S., To, M. J., an Ahn, J. (2007). Distance-Weighte Discrimination. J. Am. Statist. Ass., 102(480): Nakayama, Y., Yata, K., an Aoshima, M. (2017). Support vector machine an its bias correction in highimension, low-sample-size settings. J. Stat. lan. Inference, in press. arxiv: Qiao, X., Zhang, H. H., Liu, Y., To, M. J., an Marron, J. S. (2010). Weighte Distance Weighte Discrimination an Its Asymptotic roperties. J. Am. Statist. Ass., 105(489): Qiao, X. an Zhang, L. (2015). Flexible high-imensional classification machines an their asymptotic properties. J. Mach. Learn. Res., 16: Scholkopf, B. an Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization an Beyon. The MIT press, Cambrige, Massachusetts. Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer-Verlag, Berlin. Yata, K. an Aoshima, M. (2012). Effective CA for high-imension, low-sample-size ata with noise reuction via geometric representations. J. Multivar. Anal., 105(1):

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -