Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Size: px
Start display at page:

Download "Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation"

Transcription

1 Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas Receive: November 11, 2016 Abstract Four binary iscrimination methos are stuie in the context of HDLSS ata with an asymptotic geometric representation, when the imension increases while the sample sizes of the classes are fixe. We show that the methos Support Vector Machine, Mean Difference, Distance Weighte Discrimination an Maximal Data iling have the same asymptotic behavior as the imension increases. We stuy the consistent, inconsistent an strongly inconsistent cases in terms of angles between the normal vectors of the separating hyperplanes of the methos an the optimal irection for classification. A simulation stuy is one to assess the theoretical results. 1 Introuction The asymptotic geometric representation of multivariate ata as the imension increases while the sample size is fixe, has been stuie by Ahn et al. (2007), Hall et al. (2005) an Qiao et al. (2010). They show conitions uner which the High imension, low sample size (HDLSS) ata ten to lie eterministically at the vertices of a regular simplex, similarly to the multivariate stanar Gaussian ata. This geometric structure is use to analyze the behavior of some statistical methoologies for multivariate ata in the HDLSS setting. In particular, in Ahn et al. (2007), Jung an Marron (2009) an Jung et al. (2012) it is iscusse the behavior of rincipal Component Analysis uner the geometric representation of HDLSS ata. In Hall et al. (2005) it is stuie, in terms of probability of misclassification, the behavior of some Corresponing author. División Acaémica e Ciencias Básicas - UJAT, Carretera Cunuacán-Jalpa KM. 1, Col. La Esmerala, C ay.bolivar@ujat.mx. 1

2 binary iscrimination methos, incluing the methos Support Vector Machine (Cristianini an Shawe- Taylor (2000), Vapnik (1995)), Distance Weighte Discrimination (Marron (2015), Marron et al. (2007)) an Mean Difference (Scholkopf an Smola (2002)), when the ata have this asymptotic geometric representation as the imension increases. In Qiao et al. (2010) a similar analysis is one for the binary iscrimination metho Weighte Distance Weighte Discrimination (wdwd), they also stuy the asymptotic behavior of this metho in terms of the angle between the normal vector of the separating hyperplane an the optimal irection for classification. In Bolivar-Cime an Marron (2013), consiering Gaussian ata with common iagonal covariance matrix, it is shown that the four methos Support Vector Machine (SVM), Distance Weighte Discrimination (DWD), Mean Difference (MD) an Maximal Data iling (MD) (Ahn an Marron (2010)) have the same asymptotic behavior as the imension increases an the sample sizes are fixe, in terms of angles between the normal vectors of the separating hyperplanes of the methos. In the present paper we prove that this result of Bolivar-Cime an Marron (2013) hols for more general HDLSS ata with an asymptotic geometric representation. Note that ue to the asymptotic geometric representation of the ata, if the two classes of the training ata set have the same istribution except that one class has mean v an the other class has mean zero, the vector v is the optimal irection for the normal vector of a separating hyperplane of the ata. We showe that as the imension increases the angles between the normal vectors of the separating hyperplanes of the four methos an v converge to zero in probability when v 1/2, i.e. are consistent; an converge to π/2 in probability when v 1/2, i.e. are strongly inconsistent. In the case where v c 1/2 with 0 < c <, we showe that the angles converge to a number in the interval (0, π/2) in probability as the imension increases, i.e. the four methos are inconsistent. We provie some examples of HDLSS ata for which our results are vali. The results of the present paper complement the results of Hall et al. (2005) about the behavior of some binary classification methos. Furthermore, our results exten the results of Ahn et al. (2007) an Qiao et al. (2010), since in Ahn et al. (2007) the metho MD is not consiere, in Qiao et al. (2010) only the methos DWD an wdwd are consiere. Our results also provie a theoretical explanation of the phenomena observe in the simulations stuies of Ahn an Marron (2010) an Marron et al. (2007), in terms of means of misclassification rates. Aitionally, we compare the asymptotic behavior of the angles between the normal vectors of the four methos an the optimal irection with the asymptotic behavior of the probabilities of misclassification. We also present a simulation stuy to numerically assess our theoretical results. As is mentione in Hall et al. (2005), when N, where N is the sample size of the ata, an no k ata points lie in a (k 2)-imensional hyperplane (which happens with probability one for ata with continuous 2

3 probability ensities), the training ata set is linearly separable. In this paper we restrict our attention to the linearly separable case, assuming that the HDLSS ata set treate here is the linearly separable with probability one. This paper is ivie as follows. In Section 2 we present the geometric representation of some HDLSS ata. Our theoretical results about the asymptotic behavior of the normal vectors of the separating hyperplanes of the four methos are presente in Section 3. In Section 4 we present the asymptotic behavior of the probabilities of misclassification of the four methos. In Section 5 we provie a simulation stuy to evaluate our results. The technical etails of the paper are presente in Section 6. Finally, we provie conclusions in Section 7. 2 Geometric representation of high imensional ata The geometric representation of high imensional ata concerns the geometric structure that multivariate ata have as the imension tens to infinity while the sample size is fixe. This geometric structure can be foun for example in multivariate stanar Gaussian ata as the imension tens to infinity. 2.1 Stanar Gaussian geometrical representation As it is mentione in Hall et al. (2005), if Z has -multivariate stanar Gaussian istribution, as tens to infinity we have that Z = 1/2 + O p (1). (1) This means that as the imension increases the ranom vector Z tens to lie near the surface of an expaning sphere. Furthermore, if Z 1 an Z 2 are two inepenent -multivariate stanar Gaussian vectors, as increases we have Z 1 Z 2 = (2) 1/2 + O p (1). (2) Thus, the istance between the ata vectors is approximately constant as the imension increases. It is also true that for these vectors, as the imension increases we have Angle(Z 1, Z 2 ) = π 2 + O p( 1/2 ). (3) That is, the angle between the vectors tens to be an orthogonal angle as the imension tens to infinity. 3

4 In general, if we have n of these inepenent -multivariate stanar Gaussian vectors, as the imension increases, all pairwise istances are approximately equal an all pairwise angles are approximately perpenicular. Because all pairwise istances are nearly the same, the n vectors ten to be the vertices of a regular n-polyheron, that is a polyheron with n vertices an with eges of the same length. This n-polyheron is calle an n-simplex. 2.2 General geometrical representation In Ahn et al. (2007), Hall et al. (2005) an Qiao et al. (2010) it is shown that the approximate n-simplex structure of the stanar Gaussian ata can be observe for more general ata, as it is presente below. Let X() = (X (1), X (2),..., X () ) be the vector obtaining by truncating an infinite time series, which is written as the vector X = (X (1), X (2),... ). Let X () = {X 1 (), X 2 (),..., X n ()} be a ranom sample of inepenent an ientically istribute ranom vectors with the same istribution as X(). Assume the following: (a) The fourth moments of the entries of the ata vectors are uniformly boune. (b) For a constant σ 2, 1 var(x (k) ) σ 2 as. (4) k=1 (c) The time series X is ρ-mixing for functions that are ominate by quaratics, in the sense that whenever functions f an g of two variables satisfy f(u, v) + g(u, v) Cu 2 v 2 for fixe C > 0 an all u an v, we have sup corr[f(u (k), V (k) ), g(u (l), V (l) )] ρ(r), (5) 1 k,l<, k l r with (U, V ) = (X, X), (X, X ), where X is inepenent an has the same istribution as X, an the function ρ satisfies ρ(r) 0 as r. Uner the conitions (a), (b) an (c), in Hall et al. (2005) it is shown that the istance between X i () an X j (), for i j, is approximately (2σ 2 ) 1/2 when is large, in the sense that X i X j 2 2σ 2 as, (6) 4

5 where means convergence in probability. It is also true that X i 2 σ 2 as. (7) Therefore, the asymptotic n-simplex structure of the stanar Gaussian ata is also observe for these ata. As it is mentione in Ahn et al. (2007), conition (c) states that the variables have to be nearly inepenent, since they must satisfy a ρ-mixing conition. However, this conition is too strict because is common to have strong collinearity among variables. Furthermore, this conition epens on the orer of the ata entries, which can be arbitrary in many applications. In Ahn et al. (2007) an Qiao et al. (2010) it is shown that the asymptotic n-simplex structure of high imensional ata can be observe uner mil conitions, which are presente below. (X (1) Let X = [X 1, X 2,..., X n ] be a n ata matrix with > n, where the ranom vectors X i = i, X (2) i,..., X () i ), i = 1, 2,..., n, are inepenent an ientically istribute from a -imensional multivariate istribution with mean zero an nonnegative efinite covariance matrix Σ. the eigenvalue ecomposition of Σ is Σ = V Λ V, where Λ Suppose that is the iagonal matrix of eigenvalues λ 1, λ 2, λ, 0 an V is the matrix of corresponing eigenvectors. If Σ is positive efinite (an all its eigenvalues are positive) efine Z = Λ 1/2 V X, which is a n ranom ata matrix from a istribution with ientity covariance matrix. Observe that, if the columns of X are Gaussian, the elements of Z are inepenent stanar Gaussian univariate variables. The sample covariance matrix is given by S = n 1 X X, because the population mean is the zero vector. The ual sample covariance matrix is efine as S D, = n 1 X X, which is an n n matrix. It is important to note that S D, has the same nonzero eigenvalues as S. Using the fact that V V is the ientity, we have ns D, = Z Λ Z = λ i, W i,, (8) i=1 where W i, = Z i, Z i, an Z i,, for i = 1, 2,...,, are the row vectors of Z. Note that if X is Gaussian, W i,, for i = 1, 2,...,, are inepenent matrices from the Wishart istribution W n (1, I n ). Assume the following for the matrix X : (a ) The fourth moments of the variables are uniformly boune. (b ) The representation in (8) hols. 5

6 (c ) The eigenvalues of Σ are sufficiently iffuse, in the sense that i=1 λ2 i, ( i=1 λ 0 as. (9) 2 i,) ( ) The entries of Z are inepenent. In Ahn et al. (2007) an Qiao et al. (2010) it is shown that uner the conitions (a ) ( ), the square istance between X i an X j, for i j, is approximately 2 i=1 λ i, when is large, in the sense that X i X j 2 i=1 λ i, 2 as. (10) It is also true that X i 2 i=1 λ i, 1 as. (11) Therefore the ata ten to form an n-simplex as imension increases. It is also shown in Ahn et al. (2007) that the conition (c ) is miler than the ρ-mixing conition (c). In Jung an Marron (2009) it is shown that conition ( ) can be relaxe assuming that the entries of Z are ρ-mixing uner some permutation, however this last conition is still very strict. By the results of Yata an Aoshima (2012) we have (10) an (11) uner the conitions (a ) (c ) an the new conition s,t=1 λ s,λ t, E{(Z 2 1(s) 1)(Z2 1(t) 1)} tr(σ ) 2 0 as, (12) where Z 1(s) is the first element of Z s,. In Yata an Aoshima (2012) it is mentione that (12) is miler than ( ) (or the ρ-mixing conition for the entries of Z ), since (12) hols uner (c ) an ( ) (or the ρ-mixing conition for the entries of Z ). 3 Asymptotic behavior of the normal vectors In this section we present a generalization of the Theorem 3.1 of Bolivar-Cime an Marron (2013). That theorem states that the asymptotic behavior of the four binary iscrimination methos SVM, DWD, MD an MD is the same as the imension increases, when the two classes C + an C are Gaussian with means v an zero, respectively, an common iagonal covariance matrix Σ = iag(σ 2 1,..., σ 2 ), where {σ k} k=1 is a boune sequence of positive numbers such that k=1 σ2 k / σ2 as, for some σ > 0. It can be 6

7 seen that these ata have the asymptotic geometric representation of Section 2.2, an since the ifference between the two classes is etermine by the mean vector v, the optimal irection for the normal vector of a separating hyperplane of these ata is v. Specifically, the Theorem 3.1 of Bolivar-Cime an Marron (2013) states that, uner the above assumptions, when v 1/2 the angles between the normal vectors of the separating hyperplanes of the four methos an the optimal irection v converge to zero in probability as, i.e. are consistent; when v 1/2 0 these angles converge to π/2 in probability, i.e. are strongly inconsistent; an when v 1/2 c with 0 < c <, these angles converge to arccos(c/(γσ 2 + c 2 ) 1/2 ), where γ = 1 m + 1 n with m an n the sample sizes of C + an C, respectively, i.e. are inconsistent. Our next theorem claims that when we consier multivariate ata with an asymptotic geometric representation, similar to that of the multivariate stanar Gaussian ata or the multivariate ata of Section 2.2, the result of Theorem 3.1 of Bolivar-Cime an Marron (2013) still hols uner some conitions. Theorem 3.1 Let m, n be positive integers an let N = m + n. Let Z 1, Z 2,..., Z N be inepenent an ientically istribute -imensional ranom vectors, with mean zero an covariance matrix Σ. Let C + be the class of the ranom vectors X i = Z i + v, for i = 1, 2,..., m, an let C be the class of the ranom vectors Y j = Z m+j, j = 1, 2,..., n, where v 1/2 c, with 0 c. Assume the following: (i) The ranom vectors have the asymptotic geometric representation Z i 2 σ 2 an Z i Z j 2 2σ 2 (13) as for some σ > 0, for all i, j = 1, 2,..., N an i j. (ii) The covariance matrix Σ = (σ i,j ) an the vector v = (v (1) D Σ (v, 0) 2 v 2 =, v(2),..., v() ) satisfy k,r=1 σ k,rv (k) v(r) v 2 0 as, (14) where D Σ (x, y) = [(x y) Σ (x y)] 1/2, x, y R, is the Mahalanobis istance corresponing to Σ. Uner these conitions if v represents the normal vector of the MD, SVM, DWD or MD hyperplane of the 7

8 training ata set, then Angle(v, v ) 0, if c = ; π, 2 ( if c = 0; arccos c (γσ 2 + c 2 ) 1/2 ), if 0 < c < ; as, where γ = 1 m + 1 n. As in Bolivar-Cime an Marron (2013), we observe in our Theorem 3.1 that the asymptotic behavior of the normal vectors of the four methos is relate with the istance between the two classes, in particular with v. It is observe that when v 1/2 (c = ) the classification is easier than when v 1/2 (c = 0). This is explaine by the geometric representation of the ata sets, since the ata ten to lie at a istance σ 1/2 from the mean when is large. To illustrate the role of c in the last theorem, we present in the Figure 1 the intuitive iea of the asymptotic behavior of the ata, with σ = 1 an 0 < c <. By the asymptotic geometric representation of the ata, when imension is large the ata of the class C will be aroun the sphere of raio 1/2 with center in the origin, while the ata of the class C + will be aroun the sphere of raio 1/2 an center v. We also have that v c 1/2. Therefore, as c approximates to infinity the two spheres are far away an the classification with the four methos is easier, however as c approximates to zero the two spheres are very close an the classification is more ifficult. Figure 1: Asymptotic behavior of the ata, with σ = 1 an 0 < c <. The classification is easier when the two spheres are far away (c ) an it is more ifficult when the two spheres are very close (c 0). Observe that conition (14) is in terms of a Mahalanobis istance between the two class means. Furthermore, since Σ = Σ 1/2 Σ 1/2, where Σ 1/2 = V Λ 1/2 V, with Λ the iagonal matrix of eigenvalues of Σ an V the orthogonal matrix of corresponing eigenvectors, we have that D Σ (x, y) = Σ 1/2 x Σ 1/2 y, i.e. 8

9 D Σ (x, y) is the eucliean istance between the vectors obtaine from x an y by using the linear transformation Σ 1/2. Thus, D Σ (v, 0) = Σ 1/2 v is the eucliean istance between that linear transformation of the class means. Hence, (14) is equivalent to Σ 1/2 v 2 /( v 2 ) 0 as. We also have that conition (14) is equivalent to D Σ (v, 0)/ v = o( 1/2 ) as, which is satisfie in particular if the ratio D Σ (v, 0)/ v is boune. Therefore, conition (14) tries to control the magnitue of the Mahalanobis istance with respect to the eucliean istance between the two class means. For example, if v = (c 1/2, 0,..., 0) with 0 < c <, an Σ = 2I /2 0 (15) 0 I /2 with even, then D Σ (v, 0)/ v = 2 1/2 an conition (14) is satisfie. For this example we observe that the Mahalanobis istance between the two class means is 2 1/2 times the eucliean istance between them. Remark 3.1 The conition (i) of the last theorem is satisfie if conitions (a) (c) of Section 2.2 hol. If λ 1, λ, are the eigenvalues of Σ an i=1 λ i,/ σ 2 as, for some σ > 0, then conition (i) is also satisfie uner the conitions (a ) ( ) of Section 2.2 or the conitions (a ) (c ) an (12), because of (10) an (11). Remark 3.2 There are several cases where the conition (ii) is satisfie, some of them are the following (see Section 6 for etails): (I) The covariance matrix Σ is a iagonal matrix with entries uniformly boune. (II) The vector v has a fixe number of nonzero entries, an the secon moments of the entries of Z 1 are uniformly boune. (III) The entries of v are uniformly boune, v 1/2 c with 0 < c <, an one of the following conitions is satisfie: a) the entries of Z 1 have secon moments uniformly boune, an are ρ-mixing in the sense that sup E(Z (k) k l r 1 Z(l) 1 ) = sup σ k,l = ρ(r) 0 as r. k l r b) the entries of Z 1 have secon moments uniformly boune, an Σ has a fixe number of nonzero upper iagonals. 9

10 c) the eigenvalues of Σ, λ 1, λ 2, λ, 0, satisfy k=1 λ2 k, 2 0 as. (16) (IV) The vector v has the form v = β1, with β = β c as, where 0 c, an one of the conitions a), b) or c) of (III) is satisfie. By Remark 3.1, the multivariate Gaussian ata of Theorem 3.1 of Bolivar-Cime an Marron (2013) satisfy the conition (i) of Theorem 3.1. These ata also satisfy the conition (I) of Remark 3.2, therefore the conition (ii) of Theorem 3.1 is satisfie as well. In this sense Theorem 3.1 generalizes Theorem 3.1 of Bolivar-Cime an Marron (2013), by extening this result to more general multivariate ata with an asymptotic geometric representation. Theorem 3.1 also provie a theoretical explanation of the simulation results presente in Ahn an Marron (2010) an Marron et al. (2007), where it is observe that some of the consiere binary iscrimination methos have approximately the same behavior, in terms of means of error rates (misclassification rates), as the imension increases. For the proof of Theorem 3.1 we nee the next lemma, which is also a generalization of a result in Bolivar-Cime an Marron (2013). As it is explaine in Bolivar-Cime an Marron (2013), the normal vectors of the MD, SVM an DWD hyperplanes are proportional to the ifference between two points on the convex hulls of the two classes. The next lemma provies an explicit asymptotic representation for these ifferences, when 0 c < (the inconsistent cases). We enote by α = (α+, α ) an N-imensional vector, where α + = (α 1+, α 2+,..., α m+ ) an α = (α 1, α 2,..., α n ) are subvectors of α of imensions m an n, respectively. Lemma 3.1 Assume the same as in Theorem 3.1. Suppose that v 1/2 c with 0 c <. Let X = [X 1, X 2,..., X m ] an Y = [Y 1, Y 2..., Y n ]. If the vector ṽ = Xα + Yα, with α 0 an 1 mα + = 1 n α = 1, is proportional to the normal vector of the MD, SVM or DWD hyperplane we have that α i+ 1 m, α j 1 n, (17) as, for i = 1, 2,..., m an j = 1, 2,..., n. Thus, in the inconsistent cases the normal vectors of SVM an DWD are approximately in the same irection as the normal vector of MD when is large. This is also true for MD, as we will see in the proof of Theorem 3.1. Due to the asymptotic geometric representation of the ata (see Figure 1), in Theorem 3.1 we have that as c the angles between the normal vectors of the four methos an v ten to zero as 10

11 increases, that is, in this case the irection of the four methos is approximately the irection of v when is large. 4 Asymptotic properties of the probabilities of misclassification Assume the same as in Theorem 3.1. Suppose v / 1/2 c, with 0 < c <. Due to the asymptotic geometric representation of the ata, by Hall et al. (2005) an Qiao an Zhang (2015) we have the following two results about the asymptotic error rates of the SVM, MD an DWD hyperplanes. Theorem 4.1 Assume that n m; if nee be, interchange X an Y to achieve this. If c > σ(1/m 1/n) 1/2, then the probability that a new atum from either the X-population or the Y -population is correctly classifie by the SVM or the MD hyperplane converges to 1 as. If c < σ(1/m 1/n) 1/2, then with probability converging to 1 as a new atum from either population will be classifie by the SVM or the MD hyperplane as belonging to the Y -population. Theorem 4.2 Assume that n m; if nee be, interchange X an Y to achieve this. If c > σ[(n/m) 1/2 /m 1/n] 1/2, then the probability that a new atum from either the X-population or the Y -population is correctly classifie by the DWD hyperplane converges to 1 as. If c < σ[(n/m) 1/2 /m 1/n] 1/2, then with probability converging to 1 as a new atum from either population will be classifie by the DWD hyperplane as belonging to the Y -population. As we mentione before, uner the hypotheses of Theorem 3.1 an if 0 < c <, the normal vector of MD is approximately in the same irection as the normal vector of MD when is large. Therefore, if we take the intercept of the MD hyperplane as b = v (X + Y )/2, where X an Y are the class means of the X an Y populations, respectively, an v is the normal vector of MD, then the MD hyperplane coincies with the MD hyperplane when tens to infinity. Thus, Theorem 4.1 hols for the MD metho. By the above results, if m = n the four methos give asymptotically correct classification of a new atum from any population, for all 0 < c <. In the case where the sample sizes m an n are unequal, for example if n > m, efine M 1 = σ(1/m 1/n) 1/2, M 2 = σ[(n/m) 1/2 /m 1/n] 1/2, an note that M 2 > M 1. By the last theorems, if c > M 2 the four methos give asymptotically correct classification of a new atum from any population; if M 2 > c > M 1 then SVM, MD an MD give asymptotically correct classification of a new atum from any population, while DWD gives asymptotically perfect classification for the Y -population an asymptotically completely incorrect classification for the 11

12 X-population. This shows an asymptotic avantage of SVM, MD an MD over DWD, in the sense of classifying correctly new ata from any population for a wier range of values of c as tens to infinity. Observe that if n m an c > M 2, by the last results the four methos have the consistent property of the error rates in the sense that their error rates ten to zero as tens to infinity, an by Theorem 3.1 the four methos have the inconsistent property of the normal vectors in the sense that the angles of their normal vectors an the optimal irection o not ten to zero as tens to infinity. That is, in this case the asymptotic geometric representation of the ata allows to fin separating hyperplanes that give perfect classification even when their normal vectors are not in the same irection as the optimal irection. By Theorem 3.1 we have that when the limit of the angles between the normal vectors of the four methos an the optimal irection approaches to zero as c tens to infinity, however it is sufficient to have c > M 2 in orer to have asymptotically correct classification of a new atum from any population with the four methos, an in this situation the limit of the angles between the normal vectors of the four methos ( ) M 2 an the optimal irection is at most arccos (γσ 2 + M2 2. )1/2 In the case when n an m are unequal, the above results also show that the classification is easier when c than when c 0. In the case when n = m this is also true, since as we will see in the simulation stuy of Section 5, even when the four methos give asymptotically correct classification of a new atum from any population for all 0 < c <, as c increases the convergence of the error rates to zero as tens to infinity is faster. Aoshima an Yata (2014) an Nakayama et al. (2017) propose a bias-correcte MD an a bias-correcte SVM, respectively, to improve the performance of the error rates of MD an SVM. Theorem 3.1 also hols for the bias-correcte classifiers, since it is only about the normal vectors of the separating hyperplanes. Assume the hypotheses of Theorem 3.1. Consier the bias-correcte MD propose by Aoshima an Yata (2014), name the istance-base classifier, which is efine as follows: One classifies an iniviual X 0 into C + if W (X 0 ) < 0 an into C otherwise, where W (X 0 ) = (X 0 (X + Y )/2) (Y X) trs + /(2m) + trs /(2n), S + an S is the sample covariance matrix for C + an C, respectively. Here, trs + /(2m) + trs /(2n) is a bias-correction term. This classifier is equivalent to the scale ajuste istance-base classifier given by Chan an Hall (2009). From Theorem 1 of Aoshima an Yata (2014), the error rates of W (X 0 ) ten to zero as if D Σ (v, 0) 2 v 4 0 an tr(σ 2 ) 0, as. (18) min(m, n) v 4 As a referee pointe out, if (I) in Remark 3.2 hols, it hols that tr(σ 2 ) = O() an D Σ (v, 0) 2 = O( v 2 ). Therefore, if v 2 / c 2 > 0 then (18) hols an the error rates of the classifier W (X 0 ) ten to zero 12

13 as, even when the angle between the normal vector of the separating hyperplane an the optimal irection o not ten to zero. Observe that (18) hols even when v = δ with δ (1/4, 1/2), which correspon to the strong inconsistent case of Theorem 3.1 since v / 1/2 0. That is, in some cases, the classifier W (X 0 ) can have the consistency property of the error rates even when the normal vector is strong inconsistent with the optimal irection. This shows the goo properties of the bias-correcte classifiers. 5 Simulation stuy In this section we present a simulation stuy to illustrate numerically the theoretical results presente previously. In Bolivar-Cime an Marron (2013) it is presente a simulation stuy consiering Gaussian ata with ientity covariance matrix. In the simulations of the present paper we take more general multivariate ata with an asymptotic geometric representation, to illustrate the asymptotic behavior of the four consiere binary iscrimination methos as the imension tens to infinity. To compare our results with those of Bolivar-Cime an Marron (2013), we take the same mean vectors v consiere in that paper. We take v = ( δ, 0,..., 0) with δ = 0.2, δ = 0.5 an δ = 0.8, which correspon to the cases c = 0, c = 1 an c = of Theorem 3.1, respectively. We also consier v = β1 with β = 0.5, β = 1 an β = 10, which correspon to the cases c = 0.5, c = 1 an c = 10, respectively. We consier sample sizes m = n = 20, thus γ = 1/m + 1/n = 1/10. We take the imensions = 10, 30, 100, 300, 1000, 2000 to consier the non-hdlss an HDLSS settings. The number of the training ata sets generate for each value of is M = 500. In Ahn an Marron (2010) the MD metho is efine when N 1, where N = m + n. They also mentione that a formula for the normal vector of MD is equivalent to Fisher s iscriminant vector when N 2, which oes not have the piling property. We can view MD as the HDLSS version of Fisher s linear iscriminant metho, with zero within-class scatter an maximize between-classes scatter. Hence, in the simulation stuy presente here, we take the MD normal vector as the Fisher s iscriminant vector when N 2. Suppose is even. Let Z = (Z (1), Z (2),..., Z () ) be a ranom vector where Z (i), for i = 1, 2,..., /2, are inepenent an ientically istribute ranom variables from the univariate stanar normal istribution, an where Z (j) = Z (i)2 + Z (i) 1, for j = /2 + i with i = 1, 2,..., /2. Note that Z has mean zero an it can be seen that its covariance matrix is given by Σ = I /2 I /2. (19) I /2 3I /2 13

14 Let Z 1, Z 2,..., Z N be inepenent an ientically istribute ranom vectors with the same istribution as Z. In Section 6.4 it is shown that these ata have an asymptotic geometric representation, in particular the ata satisfy conition (i) of Theorem 3.1 with σ 2 = 2. Now we will see that the ata satisfy conition (ii) of Theorem 3.1. It is clear that the secon moments of the entries of the vector Z 1 are uniformly boune, since the istribution of the entries is only of two types (the stanar normal istribution or the istribution of Z (i)2 + Z (i) 1, where Z (i) has stanar normal istribution), an these istributions have finite secon moments which are inepenent of the value of. Observe that if v = ( δ, 0,..., 0) with δ > 0, by (II) of Remark 3.2 we have conition (ii) of Theorem 3.1. On the other han, if v = β1 with β > 0, by (IVb) of Remark 3.2 we also have the conition (ii) of Theorem 3.1, since Σ has a fixe number of nonzero upper iagonals. 5.1 Behavior of the normal vectors of the four methos The means of the angles between v an the normal vectors of the separating hyperplanes of the four consiere methos are compute for each value of, in all the consiere settings. In Figure 2 we show the means of the angles between v an the normal vectors varying the imension, for the case when v = ( δ, 0,..., 0) with δ = 0.2, δ = 0.5 an δ = 0.8. We observe that when δ = 0.2 the means of the angles between the optimal vector v an the normal vectors of the separating hyperplanes of the four methos ten to approximate π/2 = as the imension increases. When δ = 0.8 the means of the angles ten to zero as the imension increases. In this case, when δ = 0.5 the means of the angles ten to arccos(c/(γσ 2 + c 2 ) 1/2 ) = as the imension increases, where c = 1 (γ = 1/10 an σ 2 = 2). These results are accoring to Theorem 3.1. In Figure 3 we show the means of the angles between v an the normal vectors of the four consiere methos varying the imension, when v = β1 with β = 0.5, 1, 10. It is observe that when β is equal to 0.5, 1 an 10 the means of the angles approximate , an as the imension increases, respectively, which are the values of arccos(c/(γσ 2 +c 2 ) 1/2 ) with c equal to 0.5, 1 an 10, respectively. Again, these results are accoring to Theorem 3.1. For these ata we observe that although the four methos have the same asymptotic behavior as the imension tens to infinity, in terms of angles between the normal vectors an the optimal irection, the two best methos are MD an DWD, being MD the metho with the best behavior in most of the cases. Note that in the case where all the methos are consistent, MD is the metho that converge faster to the optimal irection, this is because the asymptotic optimal irection for the normal vector of the separating hyperplane is the ifference between the two class means. In the HDLSS situation DWD some times coincie 14

15 Figure 2: Means of the angles between v an the normal vectors of the separating hyperplanes of the four methos, consiering v = ( δ, 0,..., 0) with δ = 0.2, 0.5, 0.8. with MD. The thir best metho is SVM an the worse metho is MD in almost all the consiere cases. It is also observe that MD has its worse behavior when N = 40, which has been previously note in the simulations of Bolivar-Cime an Marron (2013), an in the simulations of Ahn an Marron (2010) for Gaussian ata in terms of means of misclassification rates. 5.2 Behavior of the error rates of the four methos In orer to compare the error rates of the four methos we take v = β1 with β = 0.5, 1, 10. Classification error rates were compute taking 100 new ata points from each of the two classes. In Figure 4 we show the means of the error rates of the four consiere methos for the cases β = 0.5 an β = 1, which correspon to the cases c = 0.5 an c = 1, respectively. In this figure we observe the convergence to zero of the means of the error rates of the four methos as, even when in Figure 3 we observe that the means of the angles of the normal vectors of the separating hyperplanes an the optimal irection o not converge to zero as. For the case β = 10, corresponing to the case c = 10, the means of the error rates of the four methos an for all the consiere values of were practically zero, therefore we o not inclue the graphs of the error rates for this case. In Figure 4 we observe that generally the error rates of the methos 15

16 Figure 3: Means of the angles between v an the normal vectors of the separating hyperplanes of the four methos, consiering v = β1 with β = 0.5, 1, 10. DWD an MD are the smallest, an in the HDLSS situation these error rates practically coincie. The thir best metho in terms of error rates is SVM, an the worse metho is MD. Note that similar conclusions were obtaine in Section 5.1 in terms of angles between the normal vectors of the methos an the optimal irection, however DWD sometimes have smaller error rates than MD, an MD generally have smaller angles than DWD. We also observe that as c increases the convergence of the error rates to zero is faster. In Hall et al. (2005) it is stuie by simulations the behavior of the error rates of the methos MD, SVM an DWD when n an m are unequal. They observe in their simulations that when tens to infinity the error rates of MD an SVM practically coincie, an DWD es substantially worse than these methos, consiering several values of c > 0. We i similar simulations but now incluing the MD metho, not showing here to save space, an we observe that when tens to infinity the error rates of MD practically coincie with error rates of MD an SVM. This was expecte, since for 0 < c < the hyperplanes of these three methos coincie as tens to infinity ue to the asymptotic geometric representation of the ata, an by the results of Section 4 these three methos behave better than DWD in terms of error rates. 16

17 Figure 4: Means of the error rates of the four methos, consiering v = β1 with β = 0.5, 1. 6 Technical etails The main ieas for the proofs of our results are similar to that of Bolivar-Cime an Marron (2013), however we use the Tchebysheff an the Cauchy-Schwartz inequalities. First we present some consequences of the hypothesis of Theorem 3.1 that will be very useful along this section. Let x, y = k=1 x(k) y (k) be the inner prouct of the vectors x, y R. By conition (i) we have Z i, Z j = 1 2 ( Zi 2 + Z j 2 Z i Z j 2 ) 0, for i j, (20) as. Furthermore, for any i = 1, 2,..., N, by conition (ii) an the Tchebysheff inequality, for all τ > 0 ( ) Zi, v 1/2 v > τ E( Z i, v 2 ) τ 2 v 2 = k,l=1 E(Z(k) i Z (l) i )v (k) v(l) τ 2 v 2 = 1 τ 2 k,l=1 σ k,lv (k) v(l) v 2 0 as. Thus for all i = 1, 2,..., N Z i, v 0 as. (21) 1/2 v Note that if 0 c < then (21) implies Z i, v = Z i, v v 0 c = 0 as. (22) 1/2 v 1/2 17

18 6.1 roof of Lemma 3.1 Observe the following X i Y j 2 = Z i + v Y j 2 = Z i 2 + Y j 2 + v 2 2 Z i, Y j + 2 Z i, v 2 Y j, v. Therefore, by (13), (20) an (22) it follows that X i Y j 2 2σ 2 + c 2 as. (23) We also have by (13) that for i j X i X j 2 2σ 2, Y i Y j 2 2σ 2 as. (24) Thus (23) an (24) imply that the vectors X 1, X 2,..., X m, Y 1, Y 2,..., Y n ten to be the vertices of an N- polyheron as. The rest of the proof is base on similar arguments as in the proof of Lemma 3.1 of Bolivar-Cime an Marron (2013), which are presente below. The asymptotic N-polyheron has m of its vertices arrange as those of an m-simplex an the other n vertices arrange in an n-simplex. After rescaling by 1/2, when tens to infinity the ata in C + an C ten to be the vertices of an m-simplex an an n-simplex, respectively, with eges of length 2 1/2 σ. Let X 1,..., X m be the vertices of the m-simplex an let Y 1,..., Y n be the vertices of the n-simplex. Let ṽ = Xα + Yα, with α 0 an 1 mα + = 1 n α = 1, be proportional to the normal vector of the MD, SVM or DWD hyperplane. For the two classes of the N-polyheron, it is shown in the proof of Lemma 3.1 of Bolivar-Cime an Marron (2013) that this α is given by α i+ = 1 m, α j = 1, i = 1, 2,..., m, j = 1, 2,..., n, n for these three methos. Therefore we have (17). 18

19 6.2 roof of Theorem When v is the normal vector of the MD, SVM or DWD hyperplane Case 1: c =. Let ṽ = m i=1 α i+x i n i=1 α i Y i be proportional to the vector v, with α > 0 an 1 mα + = 1 n α = 1, as in Lemma 3.1. We have ṽ, v = m α i+ Z i, v i=1 Note that by the Cauchy-Schwartz inequality Z i, v Z i v, therefore n α i Y i, v + v 2. (25) i=1 Z i, v v 2 Z i 1/2 σ 0 = 0 (26) 1/2 v as 0, for i = 1, 2,..., N. Since 0 α i+, α i 1, it follows that m i=1 α i+ Z i, v v 2 n 0, i=1 α i Y i, v 0 as. (27) v 2 Thus, by (25) we have ṽ, v v 2 1 as. (28) We also have ṽ 2 = ( m α i+ Z (k) i i=1 i=1 n i=1 α i Y (k) i The first term of the last equation is equal to m ) 2 ( m + 2 α i+ Z i, v α i+ α j+ Z i, Z j + i=1 n ) n α i Y i, v + v 2 (29) αi+ 2 Z i 2 +2 αi 2 Y i 2 +2 i=1 i<j i=1 i<j m n 2 α i+ α j Z i, Y j. (30) i=1 j=1 Using that 0 α i+ 1 an the Cauchy-Schwartz inequality we have i=1 α i α j Y i, Y j αi+ 2 Z i 2 v 2 Z i 2 α i+ α j+ Z i, Z j v 2 v 2 Z i 1/2 σ 2 0 = 0 Z j 1/2 v 2 i, σ 2 0 = 0 for i j, 19

20 as. Thus, if we ivie the first two terms of (30) by v 2 they converge to zero in probability as. Analogously, iviing the rest of the terms of (30) by v 2 they converge to zero in probability as. Note that if we ivie the secon term in the right-han sie of (29) by v 2, ue to (27) this term also converges to zero in probability as. Thus ṽ 2 v 2 1 as. (31) By (28) an (31) we have ṽ, v ṽ v = ṽ, v / v 2 ṽ / v 1 (32) as. Therefore ( ) ṽ, v Angle(ṽ, v ) = arccos 0 as. ṽ v Note that for this case the conition (ii) is not necessary. Case 2: 0 c <. Let ṽ be as before. By Lemma 3.1 an (13) it follows that m i=1 αi+ 2 Z i 2 m i=1 1 m 2 σ2 = σ2 m, m i=1 αi 2 Y i 2 n i=1 1 n 2 σ2 = σ2 n, (33) as. We have that ṽ 2 is given by (29) an the first term of (29) is given by (30). Now, by Lemma 3.1, (20) an (33) we have that (30) ivie by converges in probability to γσ 2, with γ = 1 m + 1 n, as. By Lemma 3.1 an (22) the secon term of (29) ivie by converges in probability to zero as. Thus, since v 2 / c 2 we have that ṽ 2 γσ 2 + c 2 as. (34) Diviing both sies of (25) by 1/2 v, Lemma 3.1 an (21) imply ṽ, v c as. (35) 1/2 v Now, by (34) an (35) we have ṽ, v ṽ v = ṽ, v /( 1/2 v ) ṽ / 1/2 c (γσ 2 + c 2 ) 1/2 (36) 20

21 as. Therefore ( ) ṽ, v Angle(ṽ, v ) = arccos ṽ v ( arccos c (γσ 2 + c 2 ) 1/2 ) (37) as. Note that if c = 0 then arccos(c/(γσ 2 + c 2 ) 1/2 ) = π/ When v is the normal vector of the MD hyperplane Let X an Y be the mean vectors of the classes C + an C, respectively. Let u = X Y be the MD normal vector. First, we will show that for i = 1, 2,..., m an j = 1, 2,..., n we have Angle(X i X, u) π 2, Angle(Y j Y, u) π 2 as. (38) Observe that X i X 2 = (1 1/m)Z i (1/m) Z j 2 j i ( = 1 m) 1 2 ( Z i ) ( ) 1 Z i, Z j + 1 k=1( m m m 2 Z (k) j ) 2. j i j i By (13) the first term of the last expression ivie by converges in probability to (1 1/m) 2 σ 2 as. By (20) the secon term ivie by converges in probability to zero as. Observe that the thir term is equal to 1 m 2 j i Z i 2 +2 r,s i, r<s Z r, Z s. Therefore, (13) an (20) imply that the thir term ivie by converges in probability to (m 1)σ 2 /m 2 as. Thus we have that X i X 2 ( 1 1 ) 2 σ 2 + m 1 m m 2 σ 2 = m 1 m σ2 (39) as. If Z is the mean vector of Z 1, Z 2,..., Z m, then from (13) an (20) it follows that Z 2 σ2 m as. (40) 21

22 Observe that Xi X, u = Z i Z, Z + v Y = 1 m Z i Z i, Z j + Z i, v Z i, Y Z 2 Z, v + Z, Y m j i By the last equation, if c =, from (13), (20), (21) an (40) it follows that Xi X, u 1/2 v 0 as. (41) Therefore, by (39), (41) an since u / v 1 as ue to (31), we have that Xi X, u cos[angle(x i X, u)] = X i X u = Xi X, u /( 1/2 v ) 0 (42) ( X i X / 1/2 )( u / v ) as. Similarly, cos[angle(y j Y, u)] 0 as. Thus we have (38). Analogously, but now using (22) an (34), it can be shown that (38) is also true for 0 c <. The rest of the proof is base on similar arguments as in the proof of Theorem 3.1 of Bolivar-Cime an Marron (2013), which are presente below. Define C as the matrix whose columns are the vectors X i X, Y j Y, for i = 1, 2,..., m an j = 1, 2,..., n. By Ahn an Marron (2010), the normal vector of the MD metho is given by v = Qu/ Qu where Q is the symmetric projection matrix on the orthogonal complement of the column space of C. By (38), u tens to be in the orthogonal complement of the column space of C. Therefore when is large Qu can be approximate by u, an v can be approximate by u/ u. Thus cos(angle(v, v )) = v, v /( v v ) can be approximate by u, v u v. (43) Hence, as it was shown in Section 6.2.1, (43) converges to 1 in probability if c =, an it converges to c/(γσ 2 + c 2 ) 1/2 if 0 c <. 6.3 Assumptions that imply conition (ii) of Theorem 3.1 Now it is shown that each conition (I), (II), (III) an (IV) implies the conition (ii) of Theorem

23 (I) In this case v(r) v 2 = k,r=1 σ k,rv (k) k=1 σ k,kv (k)2 v 2 M 0 as, where M is a boun of the entries of Σ. (II) We have that k,r=1 σ k,rv (k) v 2 v(r) k=1 σ k,kv (k)2 v 2 RM + 2 k<r (R 1)RM + 0 σ k,r v (k) v(r) v 2 as, where R is the number of nonzero entries of v, an M is a boun of the secon moments of the entries of Z 1. In the last inequality it is use that σ k,r (σ k,k σ r,r ) 1/2 M, an that v (k) v for all k. (III) Suppose that the entries of v are uniformly boune by R, then v(r) v 2 R 2 v 2 k,l=1 σ k,lv (k) k,l=1 σ k,l 2. (44) Since / v 2 c 2 as, if k,l=1 σ k,l 2 0 as (45) then the right-han sie of (44) tens to zero as. Now it is showe that (45) hols uner conitions a), b) or c). a) If M is a boun of the secon moments of the entries of Z 1 then k,l=1 σ k,l k=1 2 = σ k,k 0< k l <r 2 + σ k,l k l r 2 + σ k,l 2 M + (2 r)(r 1)M 2 + ( r)( r + 1)ρ(r) 2, (46) since σ k,r (σ k,k σ r,r ) 1/2 M. For any ɛ > 0, taking r large enough an later taking sufficiently large, the right-han sie of (46) is less that ɛ. Therefore, since ɛ > 0 is arbitrary it follows (45). b) If r is the number of nonzero upper iagonals of Σ, an M is a boun of the secon moments of 23

24 the entries of Z 1, then k,l=1 σ k,l 2 k=1 σ k,k k<l σ k,l 2 M + 2r( 1)M 2 (47) in the last inequality it is use that each upper iagonal has at most ( 1) nonzero elements, an since there are r nonzero upper iagonals, there are at most r( 1) nonzero elements of Σ in the upper iagonals, an σ k,r (σ k,k σ r,r ) 1/2 M. Therefore, since the right-han sie of the last inequality of (47) tens to zero as it follows (45). c) Note the following k=1 λ2 k, 2 = k,l=1 [E(Z(k) i Z (l) i )] 2 k,l=1 2 = σ2 k,l 2. (48) Furthermore, if x p = ( n k=1 x(k) p ) 1/p is the p norm in R n, since x 1 n x 2 we have that in R 2 then σ k,l k,l=1 k,l=1 k,l=1 σ k,l 2 σ 2 k,l ( k,l=1 σ2 k,l 2 1/2, ) 1/2. Therefore, by (16) an (48) the right-han sie of the last inequality tens to zero an we have (45). (IV) In this case v(r) v 2 = k,r=1 σ k,rv (k) k,l=1 σ k,l 2 k,l=1 σ k,l 2. Analogous to the case (III), by the last inequality if (45) hols it follows (14), an (45) follows from a), b) or c). 6.4 Asymptotic geometric representation of the ata in the simulations In this section we show that the multivariate ata consiere in the simulation stuies have an asymptotic geometric structure as the imension increases. Since E(Z (k)2 i ) = 1 an E[(Z (k)2 i + Z (k) i 1) 2 ] = 3, for i = 1, 2,..., N an k = 1, 2,..., /2, by the Law 24

25 of Large Numbers (LLN) we have Z i 2 = 1 2 /2 k=1 Z(k)2 i + 1 /2 k=1 (Z(k)2 i /2 2 /2 + Z (k) i 1) = 2 as. Analogously, since E[(Z (k) i k = 1, 2,..., /2, by the LLN we have Z i Z j 2 = 1 2 Z (k) j ) 2 ] = 2 an E[(Z (k)2 i + Z (k) i /2 k=1 (Z(k) i Z (k) j ) /2 k=1 [Z(k)2 i + Z (k) i / = 4 Z (k)2 j 1 (Z (k)2 j /2 Z (k) j ) 2 ] = 6, for i j an + Z (k) j 1)] 2 as. Thus the ata have an asymptotic geometric representation an satisfy the conition (i) of Theorem 3.1 with σ 2 = 2. 7 Conclusions The geometric representation of the HDLSS ata allows to analyze the asymptotic behavior of some binary iscrimination methos. In particular, uner this geometric structure of the ata an some conitions, we showe that Support Vector Machine, Distance Weighte Discrimination, Mean Difference an Maximal ata iling have the same asymptotic behavior, in terms of angles between the normal vectors of the separating hyperplanes an the optimal irection, as the imension increases an the sample sizes are fixe. Our results generalize the results of Bolivar-Cime an Marron (2013), where it is showe that the four methos have the same asymptotic behavior in the Gaussian case where the classes have common iagonal covariance matrix. Comparing the asymptotic behaviour of the angles between the normal vectors of the separating hyperplanes an the optimal irection with the asymptotic behavior of the error rates, we observe that in some cases the geometric representation of the ata allows to have the consistency property of the error rates even when the normal vectors of the separating hyperplanes are inconsistent with the optimal irection. Due to the asymptotic geometric structure of the ata, the classification with these four methos is easy when the istance between the two classes is large an it is more ifficult when it is small, since as the istance between the two classes increases, the angles between the normal vectors an the optimal irection ten to zero an the error rates approach to zero faster when the imension tens to infinity. In the simulation stuy presente here, where the sample sizes of the two classes were the same, the conclusions in terms of angles between the normal vector an the optimal irection, an the conclusions in terms of error rates were similar: Although the four methos have the same asymptotic behavior as the 25

26 imension tens to infinity, generally for large imensions the two methos with the best behavior were MD an DWD, the thir best metho was SVM an the worse was MD. The MD metho ha the best behavior in terms of angles of the normal vectors in most of the cases, this is because the asymptotic optimal irection for the normal vector of the separating hyperplane is the ifference between the two class means. The results in terms of error rates for unequal sample sizes of the classes are totally ifferent, since in this case as the imension increases the methos MD, SVM an MD practically coincie, an the DWD metho is substantially worse than these methos. However, if the istance between the two classes is sufficiently large, the error rates of the four methos ten to zero as the imension tens to infinity. As we have observe, the same asymptotic behavior of the error rates of the four methos is relate with the same asymptotic behavior of the normal vectors of the four methos, given by the results presente here. Acknowlegements The authors are grateful to Arolo erez erez an Victor erez-abreu for their help to improve an early version of this manuscript. They also thank the Eitor rof. N. Balakrishnan an the anonymous referees for their important comments an valuable suggestions, which helpe greatly to improve this work. The authors thank the Universia Juárez Autónoma e Tabasco for the support provie uring the elaboration of this paper. This work was finance by RODE, Grant UJAT-TC-178. References Ahn, J. an Marron, J. S. (2010). The Maximal Data iling Direction for Discrimination. Biometrika, 97(1): Ahn, J., Marron, J. S., Muller, K. M., an Chi, Y. (2007). The High-imension, Low-sample-size Geometric Representation Hols Uner Mil Conitions. Biometrika, 94(3): Aoshima, M. an Yata, K. (2014). A istance-base, misclassification rate ajuste classifier for multiclass, high-imensional ata. Ann. Inst. Stat. Math., 66: Bolivar-Cime, A. an Marron, J. S. (2013). Comparison of binary iscrimination methos for high imension low sample size ata. J. Multivar. Anal., 115: Chan, Y.-B. an Hall,. (2009). Scale ajustments for classifiers in high-imensional, low sample size settings. Biometrika, 96:

27 Cristianini, N. an Shawe-Taylor, J. (2000). An Introuction to Support Vector Machines an other kernelbase learning methos. Cambrige University ress, Cambrige, U.K. Hall,., Marron, J. S., an Neeman, A. (2005). Geometric Representation of High Dimension, Low Sample Size Data. J. R. Statist. Soc. B, 67(3): Jung, S. an Marron, J. S. (2009). CA Consistency in High Dimension, Low Sample Size Context. Ann. Statist., 37(6B): Jung, S., Sen, A., an Marron, J. S. (2012). Bounary behavior in High Dimension, Low Sample Size asymptotics of CA. J. Multivar. Anal., 109: Marron, J. S. (2015). Distance-weighte iscrimination. WIREs Comput. Stat., 7: Marron, J. S., To, M. J., an Ahn, J. (2007). Distance-Weighte Discrimination. J. Am. Statist. Ass., 102(480): Nakayama, Y., Yata, K., an Aoshima, M. (2017). Support vector machine an its bias correction in highimension, low-sample-size settings. J. Stat. lan. Inference, in press. arxiv: Qiao, X., Zhang, H. H., Liu, Y., To, M. J., an Marron, J. S. (2010). Weighte Distance Weighte Discrimination an Its Asymptotic roperties. J. Am. Statist. Ass., 105(489): Qiao, X. an Zhang, L. (2015). Flexible high-imensional classification machines an their asymptotic properties. J. Mach. Learn. Res., 16: Scholkopf, B. an Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization an Beyon. The MIT press, Cambrige, Massachusetts. Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer-Verlag, Berlin. Yata, K. an Aoshima, M. (2012). Effective CA for high-imension, low-sample-size ata with noise reuction via geometric representations. J. Multivar. Anal., 105(1):

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

arxiv: v1 [math.mg] 10 Apr 2018

arxiv: v1 [math.mg] 10 Apr 2018 ON THE VOLUME BOUND IN THE DVORETZKY ROGERS LEMMA FERENC FODOR, MÁRTON NASZÓDI, AND TAMÁS ZARNÓCZ arxiv:1804.03444v1 [math.mg] 10 Apr 2018 Abstract. The classical Dvoretzky Rogers lemma provies a eterministic

More information

EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION

EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION DISSERTATION Presente in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Grauate

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Discrete Mathematics

Discrete Mathematics Discrete Mathematics 309 (009) 86 869 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: wwwelseviercom/locate/isc Profile vectors in the lattice of subspaces Dániel Gerbner

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Sharp Thresholds. Zachary Hamaker. March 15, 2010

Sharp Thresholds. Zachary Hamaker. March 15, 2010 Sharp Threshols Zachary Hamaker March 15, 2010 Abstract The Kolmogorov Zero-One law states that for tail events on infinite-imensional probability spaces, the probability must be either zero or one. Behavior

More information

Lecture 6 : Dimensionality Reduction

Lecture 6 : Dimensionality Reduction CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing

More information

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations Diophantine Approximations: Examining the Farey Process an its Metho on Proucing Best Approximations Kelly Bowen Introuction When a person hears the phrase irrational number, one oes not think of anything

More information

Qubit channels that achieve capacity with two states

Qubit channels that achieve capacity with two states Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Homework 3 - Solutions

Homework 3 - Solutions Homework 3 - Solutions The Transpose an Partial Transpose. 1 Let { 1, 2,, } be an orthonormal basis for C. The transpose map efine with respect to this basis is a superoperator Γ that acts on an operator

More information

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH English NUMERICAL MATHEMATICS Vol14, No1 Series A Journal of Chinese Universities Feb 2005 TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH He Ming( Λ) Michael K Ng(Ξ ) Abstract We

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Optimal CDMA Signatures: A Finite-Step Approach

Optimal CDMA Signatures: A Finite-Step Approach Optimal CDMA Signatures: A Finite-Step Approach Joel A. Tropp Inst. for Comp. Engr. an Sci. (ICES) 1 University Station C000 Austin, TX 7871 jtropp@ices.utexas.eu Inerjit. S. Dhillon Dept. of Comp. Sci.

More information

Convergence rates of moment-sum-of-squares hierarchies for optimal control problems

Convergence rates of moment-sum-of-squares hierarchies for optimal control problems Convergence rates of moment-sum-of-squares hierarchies for optimal control problems Milan Kora 1, Diier Henrion 2,3,4, Colin N. Jones 1 Draft of September 8, 2016 Abstract We stuy the convergence rate

More information

MEASURES WITH ZEROS IN THE INVERSE OF THEIR MOMENT MATRIX

MEASURES WITH ZEROS IN THE INVERSE OF THEIR MOMENT MATRIX MEASURES WITH ZEROS IN THE INVERSE OF THEIR MOMENT MATRIX J. WILLIAM HELTON, JEAN B. LASSERRE, AND MIHAI PUTINAR Abstract. We investigate an iscuss when the inverse of a multivariate truncate moment matrix

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Generalization of the persistent random walk to dimensions greater than 1

Generalization of the persistent random walk to dimensions greater than 1 PHYSICAL REVIEW E VOLUME 58, NUMBER 6 DECEMBER 1998 Generalization of the persistent ranom walk to imensions greater than 1 Marián Boguñá, Josep M. Porrà, an Jaume Masoliver Departament e Física Fonamental,

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

Unit vectors with non-negative inner products

Unit vectors with non-negative inner products Unit vectors with non-negative inner proucts Bos, A.; Seiel, J.J. Publishe: 01/01/1980 Document Version Publisher s PDF, also known as Version of Recor (inclues final page, issue an volume numbers) Please

More information

LeChatelier Dynamics

LeChatelier Dynamics LeChatelier Dynamics Robert Gilmore Physics Department, Drexel University, Philaelphia, Pennsylvania 1914, USA (Date: June 12, 28, Levine Birthay Party: To be submitte.) Dynamics of the relaxation of a

More information

12.11 Laplace s Equation in Cylindrical and

12.11 Laplace s Equation in Cylindrical and SEC. 2. Laplace s Equation in Cylinrical an Spherical Coorinates. Potential 593 2. Laplace s Equation in Cylinrical an Spherical Coorinates. Potential One of the most important PDEs in physics an engineering

More information

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger

More information

3 The variational formulation of elliptic PDEs

3 The variational formulation of elliptic PDEs Chapter 3 The variational formulation of elliptic PDEs We now begin the theoretical stuy of elliptic partial ifferential equations an bounary value problems. We will focus on one approach, which is calle

More information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016 Noname manuscript No. (will be inserte by the eitor) Scaling properties of the number of ranom sequential asorption iterations neee to generate saturate ranom packing arxiv:607.06668v2 [con-mat.stat-mech]

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

MAT 545: Complex Geometry Fall 2008

MAT 545: Complex Geometry Fall 2008 MAT 545: Complex Geometry Fall 2008 Notes on Lefschetz Decomposition 1 Statement Let (M, J, ω) be a Kahler manifol. Since ω is a close 2-form, it inuces a well-efine homomorphism L: H k (M) H k+2 (M),

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

ON ISENTROPIC APPROXIMATIONS FOR COMPRESSIBLE EULER EQUATIONS

ON ISENTROPIC APPROXIMATIONS FOR COMPRESSIBLE EULER EQUATIONS ON ISENTROPIC APPROXIMATIONS FOR COMPRESSILE EULER EQUATIONS JUNXIONG JIA AND RONGHUA PAN Abstract. In this paper, we first generalize the classical results on Cauchy problem for positive symmetric quasilinear

More information

Quantile function expansion using regularly varying functions

Quantile function expansion using regularly varying functions Quantile function expansion using regularly varying functions arxiv:705.09494v [math.st] 9 Aug 07 Thomas Fung a, an Eugene Seneta b a Department of Statistics, Macquarie University, NSW 09, Australia b

More information

Abstract A nonlinear partial differential equation of the following form is considered:

Abstract A nonlinear partial differential equation of the following form is considered: M P E J Mathematical Physics Electronic Journal ISSN 86-6655 Volume 2, 26 Paper 5 Receive: May 3, 25, Revise: Sep, 26, Accepte: Oct 6, 26 Eitor: C.E. Wayne A Nonlinear Heat Equation with Temperature-Depenent

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Monotonicity of facet numbers of random convex hulls

Monotonicity of facet numbers of random convex hulls Monotonicity of facet numbers of ranom convex hulls Gilles Bonnet, Julian Grote, Daniel Temesvari, Christoph Thäle, Nicola Turchi an Florian Wespi arxiv:173.31v1 [math.mg] 7 Mar 17 Abstract Let X 1,...,

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

On conditional moments of high-dimensional random vectors given lower-dimensional projections

On conditional moments of high-dimensional random vectors given lower-dimensional projections Submitte to the Bernoulli arxiv:1405.2183v2 [math.st] 6 Sep 2016 On conitional moments of high-imensional ranom vectors given lower-imensional projections LUKAS STEINBERGER an HANNES LEEB Department of

More information

Center of Gravity and Center of Mass

Center of Gravity and Center of Mass Center of Gravity an Center of Mass 1 Introuction. Center of mass an center of gravity closely parallel each other: they both work the same way. Center of mass is the more important, but center of gravity

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

REAL ANALYSIS I HOMEWORK 5

REAL ANALYSIS I HOMEWORK 5 REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2. Assignment 13 Exercise 8.4 For the hypotheses consiere in Examples 8.12 an 8.13, the sign test is base on the statistic N + = #{i : Z i > 0}. Since 2 n(n + /n 1) N(0, 1) 2 uner the null hypothesis, the

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945

More information

Lower Bounds for k-distance Approximation

Lower Bounds for k-distance Approximation Lower Bouns for k-distance Approximation Quentin Mérigot March 21, 2013 Abstract Consier a set P of N ranom points on the unit sphere of imension 1, an the symmetrize set S = P ( P). The halving polyheron

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

Math 300 Winter 2011 Advanced Boundary Value Problems I. Bessel s Equation and Bessel Functions

Math 300 Winter 2011 Advanced Boundary Value Problems I. Bessel s Equation and Bessel Functions Math 3 Winter 2 Avance Bounary Value Problems I Bessel s Equation an Bessel Functions Department of Mathematical an Statistical Sciences University of Alberta Bessel s Equation an Bessel Functions We use

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Combinatorica 9(1)(1989) A New Lower Bound for Snake-in-the-Box Codes. Jerzy Wojciechowski. AMS subject classification 1980: 05 C 35, 94 B 25

Combinatorica 9(1)(1989) A New Lower Bound for Snake-in-the-Box Codes. Jerzy Wojciechowski. AMS subject classification 1980: 05 C 35, 94 B 25 Combinatorica 9(1)(1989)91 99 A New Lower Boun for Snake-in-the-Box Coes Jerzy Wojciechowski Department of Pure Mathematics an Mathematical Statistics, University of Cambrige, 16 Mill Lane, Cambrige, CB2

More information

A LIMIT THEOREM FOR RANDOM FIELDS WITH A SINGULARITY IN THE SPECTRUM

A LIMIT THEOREM FOR RANDOM FIELDS WITH A SINGULARITY IN THE SPECTRUM Teor Imov r. ta Matem. Statist. Theor. Probability an Math. Statist. Vip. 81, 1 No. 81, 1, Pages 147 158 S 94-911)816- Article electronically publishe on January, 11 UDC 519.1 A LIMIT THEOREM FOR RANDOM

More information

Iterated Point-Line Configurations Grow Doubly-Exponentially

Iterated Point-Line Configurations Grow Doubly-Exponentially Iterate Point-Line Configurations Grow Doubly-Exponentially Joshua Cooper an Mark Walters July 9, 008 Abstract Begin with a set of four points in the real plane in general position. A to this collection

More information

The Generalized Incompressible Navier-Stokes Equations in Besov Spaces

The Generalized Incompressible Navier-Stokes Equations in Besov Spaces Dynamics of PDE, Vol1, No4, 381-400, 2004 The Generalize Incompressible Navier-Stokes Equations in Besov Spaces Jiahong Wu Communicate by Charles Li, receive July 21, 2004 Abstract This paper is concerne

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

arxiv:hep-th/ v1 3 Feb 1993

arxiv:hep-th/ v1 3 Feb 1993 NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information

Jointly continuous distributions and the multivariate Normal

Jointly continuous distributions and the multivariate Normal Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13) Slie10 Haykin Chapter 14: Neuroynamics (3r E. Chapter 13) CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Neural Networks with Temporal Behavior Inclusion of feeback gives temporal characteristics to

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations Characterizing Real-Value Multivariate Complex Polynomials an Their Symmetric Tensor Representations Bo JIANG Zhening LI Shuzhong ZHANG December 31, 2014 Abstract In this paper we stuy multivariate polynomial

More information

The effect of dissipation on solutions of the complex KdV equation

The effect of dissipation on solutions of the complex KdV equation Mathematics an Computers in Simulation 69 (25) 589 599 The effect of issipation on solutions of the complex KV equation Jiahong Wu a,, Juan-Ming Yuan a,b a Department of Mathematics, Oklahoma State University,

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information