THE search for nearest neighbors (NNs) is a crucial task in

Size: px
Start display at page:

Download "THE search for nearest neighbors (NNs) is a crucial task in"

Transcription

1 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 7, JULY The Concentration of Fractional Distances Damien François, Vincent Wertz, Member, IEEE, and Michel Verleysen, Senior Member, IEEE Abstract Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between airs of data elements seem to be very similar. Therefore, the relevance of the euclidean distance has been questioned in the ast, and fractional norms (Minkowski-like norms with an exonent less than one) were introduced to fight the concentration henomenon. This aer justifies the use of alternative distances to fight concentration by showing that the concentration is indeed an intrinsic roerty of the distances and not an artifact from a finite samle. Furthermore, an estimation of the concentration as a function of the exonent of the distance and of the distribution of the data is given. It leads to the conclusion that, contrary to what is generally admitted, fractional norms are not always less concentrated than the euclidean norm; a counterexamle is given to rove this claim. Theoretical arguments are resented, which show that the concentration henomenon can aear for real data that do not match the hyotheses of the theorems, in articular, the assumtion of indeendent and identically distributed variables. Finally, some insights about how to choose an otimal metric are given. Index Terms Nearest neighbor search, high-dimensional data, distance concentration, fractional distances. Ç INTRODUCTION THE search for nearest neighbors (NNs) is a crucial task in data management and data analysis. Content-based data retrieval systems, for examle, use the distance between a query from the user and each element in a database to retrieve the most similar data []. However, the distances between elements can also be used to automatically classify data [] or to identify clusters (natural grous of similar data) in the data set [3]. To define a measure of distance among data, the latter are often described in a euclidean sace through a feature vector, and the euclidean distance is used. The euclidean distance is the euclidean norms of the difference between two vectors and is suosed to reflect the notion of similarity between them. Nowadays, most data are getting more and more comlex in the sense that a large number of features is needed to describe them; they are said to be high dimensional. For examle, ictures taken by a standard camera consist of two to five million ixels, digital books contain thousands of words, DNA sequences are comosed of tens of thousand bases, and so forth. High-dimensional data must obviously be described in a high-dimensional sace. In those saces, however, the norm used to define the distance has the strange roerty to concentrate [4], [5]. As a consequence, all airwise distances in a high-dimensional data set seem to be equal or at least. D. François and V. Wertz are with the Deartment of Mathematical Engineering, Université catholique de Louvain, Georges Lemaître, 4, B-348 Louvain-la-Neuve, Belgium. {francois, wertz}@inma.ucl.ac.be.. M. Verleysen is with the Microelectronic Laboratory, Place du Levant 3, B-348 Louvain-la-Neuve, Belgium. verleysen@dice.ucl.ac.be. Manuscrit received Aug. 005; revised 5 Set. 006; acceted Jan. 007; ublished online 5 Mar For information on obtaining rerints of this article, lease send to tkde@comuter.org, and reference IEEECS Log Number TKDE Digital Object Identifier no. 0.09/TKDE very similar. This may lead to roblems when searching for NNs [6], [7], [8], [9], [0], [], [], [3], [4], [5], [6], [7], [8], [9], [0], [], []. Therefore, the relevance of the norm, secifically of the euclidean norm, is questioned when measuring high-dimensional data similarity. Some literature suggests using fractional norms, an extension of the euclidean norm, to counter concentration effects [3], [4]. Fractional norms have been studied and used [5], [6], [7], [8], [9], but many fundamental questions remain oen about the concentration henomenon. Does the concentration henomenon occur when the data do not match the hyotheses of the theorems about concentration? Is the concentration henomenon really intrinsic to the norm/distance, or can it be caused by some other counterintuitive effect of the high dimensionality? Are fractional norms always less concentrated than higher order norms? The aim of this aer is to rovide answers to these questions. In articular, this aer will show that the norm of normalized data will concentrate even if the latter do not match the indeendent and identically distributed (i.i.d.) hyothesis. It will also show that the concentration henomenon occurs even when an infinite number of data oints are considered. Finally, this aer shows that, in contrast to what is generally acknowledged in the literature, fractional norms are not always less concentrated than higher order norms. This aer is organized as follows Section introduces the concentration henomenon and reviews the state of the art. Section 3 evokes the link between the concentration of the norm and the curse of dimensionality in database indexing. Section 4 discusses the itations of current results and extends them. For the sake of clarity, this section will only resent the new results; roofs and illustrations are gathered in Section 5. Finally, Section 6 rooses some ways to choose an otimal metric in articular situations /07/$5.00 ß 007 IEEE Published by the IEEE Comuter Society

2 874 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 7, JULY 007 Fig.. From bottom to to minimum observed value, average minus standard deviation, average value, average lus standard deviation, maximum observed value, and maximum ossible value of the Euclidean norm of a random vector. The exectation grows, but the variance remains constant. A small subinterval of the domain of the norm is reached in ractice. STATE OF THE ART The concentration of distances in high-dimensional saces is a rather counterintuitive henomenon. It can be roughly stated as follows In a high-dimensional sace, all airwise distances between oints seem identical. This aer will study the concentration of the distances through the concentration of the norm of random vectors, as done in [4], [4], [30], and [3]. Given a data set suosedly drawn from a random variable Z, we consider Z 0 that is distributed as Z and define X ¼ Z Z 0. The robability density function of X is the convolution of the resective robability density functions of Z and Z 0. Studying the distribution of distances in the data set roduced by Z, kz Z 0 k, is equivalent to studying the distribution of kxk the norm of X. Similarly, studying the airwise distances in a data set with n elements can be done by first building another data set with nðn þ Þ= elements corresonding to all airs, whose attributes are the differences between two elements and then studying the norm of this new data. This section illustrates the henomenon and gives an overview of the main known results about the concentration of the norm/distance.. An Intuitive View of the Concentration Phenomenon The following exeriment will hel in introducing the concet of concentration for the norm in high-dimensional saces. The aim is to observe that the norms of highdimensional vectors tend to be very similar to one another. Let X ¼ðX ; ;X d Þ be a d-dimensional random vector taking values in the unit cube ½0; Š d. Each comonent X i will be referred to as variable X i. We will denote by X F a random vector X distributed according to the multivariate robability density function F. Let ¼fx ðjþ g n j¼ IRd be a finite samle drawn from X, that is, a set of indeendent realizations of X. We consider the set fkx ðjþ kg n j¼ of all the norms of the xðjþ. Obviously, the values of kx ðjþ k are bounded kx ðjþ k½0;mš, where M ¼kð;...; Þk. In low-dimensional saces ðd < 0Þ, ifn is not too small, min j fkx ðjþ kg n j¼ will be close to zero, and max jfkx ðjþ kg n j¼ will be close to M. However, in higher dimensional saces ðd >0Þ, this is not verified anymore. Fig. shows the average value, emirical standard deviation, and maximum and minimum observed values of the norm of a uniformly randomly drawn samle of size n ¼ 0 5 in saces of growing dimension. The euclidean norm is considered, so M ¼ ffiffiffi d. We can observe that the average value of the norm increases with the dimension, whereas the standard deviation seems rather constant. When the dimension is low (Fig. a), we can see that the minimum and maximum observed values are close to the bounds of the domain of the norm, resectively, 0 and ffiffiffi d. When the dimension is large, say, from dimension 0 onward, the maximum and minimum observed values tend to move away from the bounds. Indeed, even with a large number of oints ð0 5 Þ, all the observed norms seem to concentrate in a small ortion of their domain. In addition, this ortion gets smaller and smaller as the dimension grows, when comared to the size of the total domain. This henomenon is referred to as the concentration of the norm. Section.3 will review some results from the literature about the concentration of the norm in highdimensional saces; Minkowski norms will be introduced in Section... The Euclidean Norm and the Minkowski Family The Minkowski norms form a family of norms arametrized by their exonent ¼ ; ;...

3 FRANÇOIS ET AL. THE CONCENTRATION OF FRACTIONAL DISTANCES 875 Fig.. Two-dimensional unit balls for several values of the arameter of the -norm. kxk ¼ X i jx i j! ðþ When ¼, the Minkowski norm corresonds to the euclidean one, which induces the euclidean distance. For ¼, it is sometimes called sum-norm and induces the Manhattan or city-block metric. The it for!is the max-norm, which induces the Chebychev metric. In the sequel, we will consider extensions of Minkowski norms to the case where is a ositive real number. For, those extensions are indeed norms, but for 0 <<, the triangle inequality does not hold and, hence, they do not deserve the name; they are sometimes called renorms. Actually, the inequality is reversed. A consequence is that the straight line is no longer the smallest ath between two oints, which may seem counterintuitive. In the remainder, we will denote -norm a norm or renorm of the form () with IR þ. We will call a fractional norm a -norm with <. Fig. deicts the D unit balls (that is the set of x ðjþ for which kx ðjþ k¼) for values of equal to,,, and infinity. We can see that, for, the balls are convex; for 0 <<, however, they are not. In Sections.3,.4, and.5, results from the literature about the concentration of Minkowski norms and fractional -norms will be resented..3 Concentration of the Euclidean Norm In the exeriment described at the beginning of this section, we observed that the exectation of the norm of a random vector increases with the dimension, whereas its standard deviation (and, hence, its variance) remains rather constant. Demartines [4] has theoretically confirmed this fact. Theorem Demartines. Let X IR d be a random vector with i.i.d. comonents X i F. Then, EðkXk Þ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffi ad b þ Oð=dÞ and ðþ VarðkXk Þ¼bþOð= ffiffiffi d Þ; ð3þ where a and b are constants that do not deend on the dimension. The theorem is valid whatever the distribution F of the X i might be. Different distributions will lead to different values for a and b, but the asymtotic results remain. The theorem roves that the exectation of the euclidean norm of random vectors increases as the square root of the dimension, whereas its variance is constant and indeendent of the dimension. Therefore, when the dimension is large, the variance of the norm is very small comared with its exected value. Demartines concludes that when the dimension is large, vectors seem normalized The relative error made while considering EðkXk Þ instead of the real value of kxk becomes negligible. As a consequence, high-dimensional vectors aear to be distributed on a shere of radius EðkXk Þ. Demartines also notes that, since the euclidean distance is the norm of the difference between two random vectors, its exectation and variance follow laws () and (3); airwise distances between oints in high-dimensional saces seem to be all identical. Demartines mentions that if X i are not indeendent, the results are still valid rovided that we relace d with the actual number of degrees of freedom. The result from the work of Demartines is interesting in that it confirms the exerimental results, but it is restricted to the euclidean norm and makes the rather strong hyothesis of indeendence and identical distributions..4 Concentration of Arbitrary Norms Indeendent of the results of Demartines work, Beyer et al. exlored the effect of dimensionality on the NN roblem [5]. Whereas Demartines defines a data set as consisting of n indeendent draws x ðjþ from a single random vector X, Beyer et al. consider n random vectors P ðjþ ; a data set is then made of one realization of each random vector. The main result of Beyer et al. s work is the following theorem. The original theorem is stated for an arbitrary distance measure; it is rewritten here with norms for illustration uroses. Theorem Beyer et al., adated. Let P ðjþ j n be n d-dimensional i.i.d. random vectors. If! kp ðjþk EðkP ðjþ kþ ¼ 0 ð4þ then, for any >0, " # j kp ðjþ kmin j kp ðjþ k min j kp ðjþ k ¼ The theorem is interreted as follows Suose a set of n data oints, randomly distributed in the d-dimensional sace. Some query oint is suosed to be located at the origin, without loss of generality. Then, if hyothesis (4) is satisfied, indeendent of the distribution of the comonents of the P ðjþ, the difference between the largest and smallest distances to the query oint becomes smaller and smaller when comared with the smallest distance when the dimension increases. The ratio is called the relative contrast. max j kp ðjþ kmin j kp ðjþ k min j kp ðjþ k

4 876 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 7, JULY 007 Beyer et al. conclude that all oints seem to be located at aroximately the same distance from the query oint; the concet of NN in a high-dimensional sace is then less intuitive than in a lower dimensional one. Beyer et al. exlore some scenarios that satisfy (4) and some that do not. A roof that Minkowski norms and fractional norms satisfy the hyothesis will be given in Section Concentration of Minkowski Norms Hinneburg et al. focused on the search for NN as well [3]. They roduced the following theorem relative to Minkowski norms. Theorem 3 Hinneburg et al. Let P ðjþ j n be n d-dimensional i.i.d. random vectors and kk be the Minkowski norm with exonent. If the P ðjþ are distributed over ½0; Š d, then there exists a constant C indeendent of the distribution of the P ðjþ such that C E max! j kp ðjþ k min j kp ðjþ k ðnþc d ð5þ Hinneburg et al. note that a consequence of (5) is the surrising fact that, on average, the contrast max kp ðjþ k min kp ðjþ k j j ð6þ grows as d ==. They further conclude that, as a result, the contrast converges to a constant when the dimension increases and when the euclidean ffiffiffi distance is used. For the L norm, it increases as d, for the euclidean norm ð ¼ Þ, it remains constant, and for norms with 3, it tends toward zero. Hinneburg et al. conclude that, for L metrics with 3, the NN search in a high-dimensional sace tends to be meaningless for Minkowski norms with exonent larger than or equal to 3, since the maximum observed distance tends toward the minimum one. The distance has lost its discriminative ower between the notions of close and far. The conclusion we can draw from this theorem is the following On average, the ratio between the contrast and d == is bounded. However, the bounds themselves deend on the value of. Furthermore, if the number of oints n is large, the uer bound may be very large too. In ractice, though, it may aear that the value of the ratio is much closer to the lower bound than to the uer one. A result indeendent of the number of oints is resented in Section 4.. Yianilos [3] mentions that the standard deviation of the - norm,, of a uniformly distributed random vector is ðd == Þ and sketches a roof. In contrast to the aroaches by Beyer et al. and Hinneburg et al., Yianilos considers random vector distributions rather than realizations of random vectors. He also asserts that the i.i.d. hyothesis can be weakened. He then makes the same conclusions as Hinneburg et al. and investigates the consequences for excluded middle vantage oint forests methods for similarity search in metric saces. We will see a similar result in Section 4 without the restriction of the uniform distribution and on the values of..6 Concentration of Fractional Norms Aggarwal et al. extend Hinneburg s result to fractional -norms [4]. In a sense, if ¼ is better than ¼ 3 and ¼ is better than ¼, why not to look at ¼ to see if it is better than ¼? Aggarwal et al. roduced the following theorem. Theorem 4 Aggarwal et al. Let P ðjþ j n be n d-dimensional indeendent random vectors, uniformly distributed over ½0; Š d. There exists a constant C indeendent of and d such that sffiffiffiffiffiffiffiffiffiffiffiffiffi C E max! j kp ðjþ k min j kp ðjþ k þ min j kp ðjþ k s ðn ÞC ffiffiffi d ffiffiffiffiffiffiffiffiffiffiffiffiffi þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Aggarwal et al. note that the constant =ð þ Þ may lay a valuable role in affecting the relative contrast and confirmed it exerimentally with synthetic data sets. They then conclude that, on average, fractional norms rovide better contrast than Minkowski norms in terms of relative distances. The next section will rovide a similar result indeendent of the number of oints. That result will show that the conclusions of Theorem 4 cannot be extended to the case where the data are not uniformly distributed. Indeendent of Aggarwal et al., Skala [30] has shown that the ratio ðdþ ¼ EðkXk Þ VarðkXk Þ ; increases linearly with dimension d. Here, X is a random vector whose comonents are i.i.d. distributed. The theorem does not require uniform densities; however, it is exact only in the ¼ case and gives an aroximate result for other values of. 3 CONCENTRATION AND SIMILARITY SEARCH IN DATABASES In order to better understand the imact of concentration on NN search and indexing, this section reviews some examle studies available in the literature. It further describes the context of this research and justifies the imortance of the study of concentration and alternative metrics. The major consequence of concentration on NN search is that indexing methods, which have an exected logarithmic cost, actually sometimes erform no better than simle linear scanning. This has been described among others in [3], [33], [34], and [35] and was acknowledged by many others as the curse of dimensionality. Consequently, new indexing structures, secially designed for high-dimensional data were suggested, like the X-tree [34], TV-tree [35], and so forth, and an aroximate search was roosed as a solution [36]. A survey of these methods can be found in [37]. It seems that Brin [38] was one of the ioneers to actually relate the curse of dimensionality to the articular shae of distance distributions in high-dimensional saces. He ð7þ

5 FRANÇOIS ET AL. THE CONCENTRATION OF FRACTIONAL DISTANCES 877 roosed the Geometric Near-neighbor Access Tree (GNAT) indexing structure that was built taking into account the minimum and maximum distances in the data set. Whereas Berchtold et al. [39] suggested that the decrease of erformances is due to the boundary effects, coming from the fact that all oints seem located in the corners of the sace, Weber et al. [40] rooses a review of counterintuitive roerties of high-dimensional geometrical saces and relates them to the losses of erformances. In view of their theorem (Theorem ), Beyer et al. exlain the decreases of erformances of the indexing methods by the instability of the search for NNs. They roose to use less concentrated metrics and further question the intrinsic relevance of NN search, indeendent of erformance issues, for concentrated metrics. Whereas many of the resonses to the curse of dimensionality are to introduce a new indexing method, Hinneburg et al. [3] and Aggarwal et al. [4] roose to use alternative metrics that are less concentrated. Their main urose, however, is to use a more meaningful metric in high-dimensional saces and not to accelerate the search. In both cases, using less-concentrated norms either to erform better with indexing structures or to get a more meaningful NN search thus assumes that the concentration henomenon is intrinsic to the notion of the norm. The aim of this aer is to show that the concentration is indeed intrinsic to the norm. To this end, we will consider results that are indeendent of the number of oints. Considering a high-dimensional bounded sace with some oints scattered over it, it seems reasonable to think that the less oints you have, the more concentrated the distances are akin to be since the maximum distance will decrease while the minimum distance will increase. The number of oints arameter either has to be taken into account or has to be ruled out of the equation. In the following, we roose to do this by considering data distributions instead of data sets. The conclusion of our study is that, indeed, the concentration henomenon is intrinsic to the norm and, thus, justifies Aggarwal et al. s work on fractional metrics to reduce concentration. From Aggarwal et al. s results, it is often extraolated that using fractional distances with small values of the arameter will decrease the concentration henomenon. The latter fact has subsequently been used as an argument to use fractional norms in all sorts of situations without first checking for alicability. However, the restriction is that the data must be uniformly distributed. As we are dealing here with high-dimensional saces, it aears quite evident that real data can only sarsely oulate high-dimensional saces because of their finite number and because they often lie on a submanifold. Therefore, data are far from being uniformly sread over the sace. This study confirms that the distribution of data has to be taken into account to estimate the concentration. Deending on the distribution, the evolution of the concentration with resect to the value of arameter may be increasing as gets smaller; in other cases, it also may have a local maximum for some value of, as will be shown later. 4 FURTHER THEORETICAL RESULTS Section reviewed major results from the literature. This section will describe new results in order to better understand the henomenon of norm concentration in high-dimensional saces. All roofs and examles are ostoned to Section 5 for ease of readability. 4. The Finite and Bounded Samle A finite number of oints will most robably be sarsely distributed in a high-dimensional sace. Most oints will be far away from each other, and the density will be very low over the whole sace. This is referred to as the Emty Sace Phenomenon [4]. Furthermore, if the oints live in a closed (bounded) region of the sace, for instance, the ½0; Š d hyercube, then the maximum distance is bounded too. In such a situation, it may haen that the relative contrast is very low. The questions are then Is the concentration henomenon a side effect of the Emty Sace Phenomenon, just because we consider a finite number of oints in a bounded ortion of a high-dimensional sace? Would the conclusions still hold if an infinite number of oints (in other words a distribution) sanning the whole sace was considered? Unfortunately, the results by Beyer et al., Hinneburg et al., and Aggarwal et al. cannot be extended to the case where the number of data oints is arbitrarily large. Indeed, the bounds on the relative contrast deend on the number of oints. Furthermore, if the values the data oints P ðjþ can take are unbounded, the notion of relative contrast may not be relevant anymore since it relies on maximum and minimum values. In contrast to Beyer et al. s, Hinneburg et al. s, and Aggarwal et al. s results, Demartines and Yianilos ones do not refer to a finite number of oints but rather to a distribution. Unfortunately, they consider Minkowski norms only. An interesting result would be to extend their results to fractional -norms. For that urose, it is roosed to use the ratio qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarðkXk Þ RV F; ¼ ð8þ EðkXk Þ as a measure of the concentration; RV F; will be called the relative variance of the norm. It is nonnegative; a small value indicates that the distribution of the norm is concentrated, whereas a larger one indicates a wide effective range of norm values. Intuitively, we can see that RV F; measures the concentration by relating a measure of sread (standard deviation) to a measure of location (exectation). In that sense, it is similar to the relative contrast that also relates a measure of sread (range) to a measure of location (minimum). The main result of this section is that, regarding the relative variance, all -norms concentrate as the dimensionality of the sace increases. This is stated more recisely in Theorem 5, deduced from two lemmas that, resectively, characterize the individual behaviors of the variance and of the exectation of the norm. Those lemmas are resented here while their resective roofs can be found in Section 5.. Lemma. Let X ¼ðX ; ;X d Þ be a random vector with i.i.d. comonents X i F. Then, EðkXk Þ ¼ c ð9þ d =

6 878 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 7, JULY 007 with c being a constant indeendent of the dimension but related to and to the distribution F. Therefore, it aears that the exectation of the -norm of a random vector grows as the th root of the dimension. Note that this result is consistent with () in the euclidean case ð ¼ Þ. The second lemma concerns the variance of the norm. Lemma. Let X ¼ðX ; ;X d Þ be a random vector with i.i.d. comonents X i F. Then, VarðkXk Þ ¼ c 0 ð0þ d with c 0 being a constant indeendent of the dimension but related to and to the distribution F. The deendency on d indicates that the variance remains constant with the dimension for the euclidean distance. We can also see that the variance decreases when the dimension increases for -norms with > and increases for -norms with <. This is consistent with Demartines and Hinneburg et al. s results. From these Lemmas, it can be shown that all -norms concentrate as the dimension increases. Theorem 5. Let X ¼ðX ; ;X d Þ be a random vector with i.i.d. comonents X i F. Then, qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarðkXk Þ EðkXk Þ ¼ 0; that is, the relative variance decreases toward zero when the dimension grows. The roof can be found in Section 5.. From Theorem 5, it can be concluded that the concentration of the norms in high-dimensional saces is really an intrinsic concentration roerty of the norms and not a side effect of the finite samle size or from the Emty Sace Phenomenon. This result extends Demartines one to all -norms and does not deend on the samle size. Moreover, although all -norms concentrate as the dimension increases, they do not concentrate in the same way. Characterizing these differences with resect to is the toic of Section Imact of the Value of on the Concentration In the revious section, it has been shown that all -norms concentrate, whatever is the value of ð >0Þ. In this section, the relationshi between the relative variance and the value of for a given dimension will be made exlicit. Theorem 6. Let X ¼ðX ; ;X d Þ be a random vector with i.i.d. comonents X i F. Then, ffiffiffi d q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarðkXk Þ EðkXk Þ ¼ F; ; F; where F; ¼ EðjX i j Þ, and F; ¼ VarðjX ij Þ. If d is large, we can thus aroximate the relative variance by RV F ; ¼ q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarðkXk Þ EðkXk Þ ffiffiffi d F; F; This means that, for a fixed (large) dimension, the relative variance evolves with as K F; ¼ F; ðþ F ; The evolution of K F; with determines the value of the relative variance for a fixed dimension d. Aggarwal et al. has shown that, if the oints are uniformly distributed, the relative contrast at fixed dimension increases as decreases. We will show below and rove in Section 5.3 that, under the uniform distribution hyothesis, the relative variance increases as decreases too. However, in general, the function described in () is not always strictly decreasing with as stated in the following roosition Proosition. The relative variance () is ) a strictly decreasing function of when the variables are distributed according to a uniform distribution F over the interval [0, ], but ) there exists distributions F for which it is not the case. Fractional norms are not always less concentrated than higher order norms. The roof is given in Section 5.. There are thus data for which the -norm is more concentrated than the -norm, for instance; in general, a higher order norm can be less concentrated than a fractional norm by having a higher relative contrast or relative variance. In conclusion, using fractional norms does not always bring less concentrated distances. 4.3 The i.i.d. Hyothesis All theorems resented in Section 4 rely on the fact that data are suosed to be i.i.d. What haens if it is not the case in ractice (actually it will hardly ever be)? Are these assumtions really needed, or do they merely make the roofs easier or even feasible? This section addresses these issues. First, the identically distributed assumtion is considered. In the revious sections, we have suosed that all X i are distributed according to the same distribution X i F. If variables X i are not identically distributed, it means that each X i is distributed according to some distribution noted F i 8i X i F i. Proosition. If the data are not identically distributed, then the conclusion of Theorem 5 still holds rovided that the data are normalized. The roof is given in Section 5.3. Normalizing means here to subtract the mean from the variables and divide them by their standard deviation so that 8i EðX i Þ¼0 and VarðX i Þ¼. Normalizing data is often considered as imortant when using norms and distances, because it ensures that all variables X i will have equal influence on the comutation of the norm. If this it is not the case, the variables with the largest variances will have the largest influence on the distance value, whereas the variables with low variances will have little or no influence on the comutation of the norm.

7 FRANÇOIS ET AL. THE CONCENTRATION OF FRACTIONAL DISTANCES 879 Actually, norms will concentrate for nonidentically distributed data if they are normalized. However, if the data are not normalized, some variables may have too little effect on the distance value. The indeendent assumtion may not be encountered in ractice. If X has comonents X i that are not indeendent, it means that the joint distribution F of X ¼ ðx ; ;X d Þ differs from the roduct of the marginal distributions F i. The existence of relations between the elements of X means that X lies on a (ossibly nonlinear) submanifold of IR d. If the submanifold is nonlinear, the euclidean norm, for instance, measures the distances between data oints using short cuts, that is, through a straight line in the sace; this can be very different from a geodesic distance measured along the manifold [4], [43]. However, the manifold can be reresented in a vector sace IR dint whose dimension is the number of degrees of freedom of the manifold. Dimension d int is called the intrinsic dimension and d the embedding dimension. Each realization of X in the embedding sace may then be maed to a unique realization in a d int -dimensional rojection sace. One may thus exect that the asymtotic roerties of the norms behave similarly in both saces. Actually, the measure of intrinsic dimensionality roosed by Chavez et al. [] is recisely the inverse of the square of the relative variance. Korn et al. [] relate the concentration to the fractal dimension of the data, which is another way to measure its intrinsic dimensionality. The fact that the concentration deends on the intrinsic dimension of the data is often admitted [], even if there is no consensus about the actual definition of the intrinsic dimension. As a consequence, high-dimensional data that resent a lot of correlation or deendencies between variables will be much less concentrated than if all variables are indeendent. The conclusion we can draw is that the concentration henomenon deends on the intrinsic dimension of the data more than on the dimension of the embedding sace. 5 PROOFS AND ILLUSTRATIONS In this section, roofs are given for Lemmas and, for Theorems 5 and 6, and for Proositions and. 5. Proof of Theorem 5 The roof of Theorem 5 is based on Lemmas and. Therefore, the roofs of the lemmas are given first. tu 5.. Proof of Lemma The roof requires two stes. First, it will be shown that, under the assumtions of Lemma, we have kxk P ¼ c ¼ ; ðþ d = where c is a constant indeendent of the dimension d of X (Ste ). Then, this result will be extended to the exectation of the norm (Ste ). Ste. Let S i ¼jX i j for i ¼ ;...;d. The S i are i.i.d. as well; let F; be their exectation. The Strong Law of Large Numbers (SLLN) allows us to write Then, " # P d Xd S i ¼ F; ¼ P4 or, by definition of S i, P4 d = d Xd X d! 3 = = S i ¼ F ; 5 ¼! 3 = jx i j = ¼ F; 5 ¼ This is nothing else than kxk P d ¼ = F; = ¼ ; which concludes the roof with c ¼ = F;. Ste. By (), for any realization of X excet some, a subset of IR d with measure 0, we have Thus, Z IR d n F ðþ kk kk d ¼ = F; = d d ¼ = Z IR d n ¼ F; = F ðþ F ; = d ð3þ ð4þ with the last equality coming from the fact that = F; is bounded. In addition, we have Z Z kk F ðþ d IR d n d ¼ kk = IR dfðþ d d ð5þ = Z ¼ IR dfðþkk d d ð6þ = since is of measure 0. Combining (4) and (6) gives Z kk d ¼ IR d d= EðkXk Þ d = ¼ F; = ; ð7þ which concludes the roof since the right-hand side of (7) is indeendent of the dimension d. tu 5.. Proof of Lemma Let F; be the exectation of variable jx i j and F; its variance. For random vectors X of dimension d and a given -norm, we have

8 880 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 7, JULY 007 Since we have Therefore, VarkXk ¼ d = kxk EkXk ¼ kxk EkXk d == VarkXk d = E kxk EkXk d = " ¼ E kxk EkXk # d == kxk EkXk r ; EkXk P r¼0 kxkr = kxk EkXk ffiffiffi d ¼ r P r¼0 P r¼0 ðkxk d = kxk d = r EkXk d = 0 Þ= ðkxk EkXk ffiffiffi 3 6B d C 7 ¼ E4@ A 5 r EkXk d = Using the theorem by Lebesgue and Fatou about the convergence of integrable functions, we can swa the it and the exectation oerators VarkXk d = Þ= EkXk ffiffiffi d ðkxk ¼ E r 5 P r¼0 ðkxk d = r EkXk d = 3 Þ= ðkxk EkXk ffiffiffi d ¼ E6 4 7 r 5 P r¼0 ðkxk d = r EkXk d = The transition from the former to the latter is allowed since the denominator does not tend toward zero. Because of Lemma, we know that EkXk tends toward a d = constant as d increases. Furthermore, we have seen that kxk almost surely, that is, with robability, also tends d = toward a constant. Therefore, the denominator almost surely tends toward a constant as the dimension grows " X kxk r EkXk # r P ¼ ðþ= d r¼0 = d = F ; 3 r ð8þ is, which means that, with robability, Þ= E VarkXk ðkxk EkXk ffiffiffi d ¼ d = ðþ= F; If we focus on the numerator now, we notice that kxk EkXk X d ¼ ¼ Xd jx i j EkXk 0 EkXk jx i d Using the result from Lemma, we write kxk EkXk The numerator can now be written as Since we have E4 E Xd E4 X d " # jx i j F; X d However, Var A X d ¼ jx i j F; jx i j! 3 F ; =d5 ¼ Xd ¼ Xd ¼ Xd ¼ 0; jx i j! 3 F; 5 ¼ Var X d jx i j! F; ¼ Xd leading to the conclusion that E4 X d E jx i j F; E½jX i j Š F ; F; F ; ¼ Xd X d ¼ d F ; jx i j! F; Var jx i j F ; VarðjX i j Þ jx i j! 3 F; =d5 ¼ F; ð9þ Using exression (9) for the numerator and (8) for the denominator, the result is that " # VarkXk F; P ¼ ¼ ð0þ d = ð ðþ= F; Þ The same arguments as in Ste from the roof of Lemma can be used to get

9 FRANÇOIS ET AL. THE CONCENTRATION OF FRACTIONAL DISTANCES 88 VarkXk F; ¼ d = ð ðþ= F; Þ ; ðþ which roves the lemma since the right-hand side of () is indeendent of the dimension d. tu 5..3 Proof of Theorem 5 Lemmas and are used to rove Theorem 5. Writing qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarkXk EkXk ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffi VarkXk d == d = EkXk d = and taking the it qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarkXk EkXk ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffi VarkXk d == we have, by Lemma and Lemma qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarkXk EkXk ¼ ffiffiffi c c 0 EkXk d = d = ; d= ¼ 0 5. Proof of Theorem 6 and Proosition Similar to Theorem 5, the roof of Theorem 6 is based on Lemmas and. Theorem 6 is roven as follows From Lemmas and, ffiffiffi d q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarkXk EkXk ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffi VarkXk d == EkXk d = ffiffiffi c ¼ 0 c ut ðþ Using the values of c and c 0, resectively, from (7) and (), we have ffiffiffi c c 0 ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi F; F; F; ¼ F; F; ð3þ Proosition contains two claims; the roofs are, resectively, given in Part a and Part b. Part a. If F is the uniform distribution over the interval [0, ], F; is given by F; ¼ þ Since F; ¼ F; F;, we have F; ¼ þ sffiffiffiffiffiffiffiffiffiffiffiffiffi þ Fig. 3. Relative variance for data distributed as F, as a function of. We can see that a maximum is obtained for a rather large value of, far from. Therefore, F; F ; ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffi ð4þ þ We can conclude that, under the uniform distribution and large dimension d hyotheses, the relative variance decreases with. The concentration of the norm thus increases as grows. Part b. A counterexamle is rovided to rove assertion b of the roosition. Let us consider a situation where data are disatched into two Gaussian clusters with variance ¼ and, resectively, centered on and. The marginal distribution of each X i is then F ðþ ¼ cst ð e Þ þ e ð Þ þ ð5þ In this examle, the relative variance is higher for higher order norms than for fractional norms, as illustrated in Fig. 3; consequently, fractional norms are more concentrated than higher order norms with values of ½8; 30Š. The next examles are taken from real data sets used by Aggarwal et al. in [4] the segmentation data set and the Wisconsin Diagnostic Breast Cancer (WDBC) data set from the University of California, Irvine (UCI) Machine Learning Reository [44]. Fig. 4 shows a lot of the relative contrast as a function of. In Fig. 4a (the segmentation data set), between ¼ and ¼, the relative contrast is actually increasing. This is a real examle where the euclidean norm is actually less concentrated than the -norm. Fig. 4b is even more interesting. Here, the relative contrast is consistently better for higher order norms than for fractional norms. tu 5.3 Proofs for Proosition In Section 5., the roof of Theorem 5 was built using the Law of Large Numbers. Although this law is often stated with the i.i.d. hyothesis, the identically distributed one is sufficient but not necessary. Actually, if the data are normalized, the assumtion of identical distributions is not necessary. It has been shown that a less restrictive sufficient condition for the Law of Large Numbers to hold is that

10 88 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 7, JULY 007 Fig. 4. (a) Relative contrast for the segmentation and (b) the WDBC data sets. The relative contrast is higher for higher order norms than that for the fractional norms. ZZ F i ð i Þ¼ F ð ;... i ;...; d Þ d... d i d iþ... d d the set of X i must be uniformly integrable [45]. This means that, for any >0, there exists some K< such that for every X i Z F i ðþjjd < ð6þ jx ijk Note that (6) is actually the exectation of jx i j restricted to the values jx i jk. It is equivalent to [45] 9k > su EjX i j k < ð7þ i If the data are normalized, EðX i Þ¼0, and VarðX i Þ¼ for all i. We then have ¼ Var X i ¼ E X i E X i ¼ E X i ¼ EjX ij Taking k ¼ in (7) leads to the conclusion that if the data are centered and reduced, as often done in ractice, then they are uniformly integrable. In this case, the Law of Large Numbers still holds without the assumtion of equal distributions. Therefore, all subsequent results are valid rovided that the data are normalized; this roves Proosition. To show that the indeendent art of the i.i.d. hyothesis is not necessary either, we will comare the concentration of norms in real data sets with the same embedding dimensions but different intrinsic dimensions. Suose we have some data described in a d-dimensional sace for which we know that the intrinsic dimension d int is lower. Such a situation occurs, for examle, when the variables are significantly correlated. We will see that, for such data, the value of the relative variance is much more similar to the one of a d int -dimensional data set with indeendent variables than of a d-dimensional data set with indeendent variables. For that urose, we will ensure that marginal distributions, that is, the distribution of each variable taken searately, are identical in both data sets. Suose we have ¼fx ðjþ g n j¼, a samle drawn from X ¼ðX ; ;X d ÞFð ; ;...; d Þ, where F is the multivariate robability density function of X. The marginal distributions of X i are To roduce a data set 0 that is marginally identically distributed as, we roose the following If we consider a matrix where each row corresonds to a data element x ðjþ and each column to a variable X i, the values in each column are randomly ermuted. As a consequence, the marginal distributions of each variable will not change. By contrast, all relationshis between variables are destroyed in the rocess. Therefore, we obtain a samle 0 that is marginally distributed as, but where the comonents are now indeendent; the intrinsic dimension of 0 is thus equal to its embedding dimension. Let us denote by RV d ; the estimation of RV F; with the euclidean norm given the data set with high embedding dimension and low intrinsic dimension. We will comare this value to the value of the relative variance of 0 ðrv d 0 ;Þ that has a high intrinsic dimension. Moreover, we will comare those values to the value of the relative variance for a data set made of a small number (tyically 0 times lower than the original number of variables) of lowcorrelated variables from ðrv d s; Þ (small embedding dimension). The relative variance of a uniformly distributed synthetic data set of dimensionality d ðrvd rand; Þ (high intrinsic dimension) is also comuted. It is exected that the relative variances for data sets with low intrinsic dimension will be similar while being much lower than the relative variances of the data sets with high intrinsic dimension. The relative variance for a data set ¼fx ðjþ g n j¼ is estimated as follows rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P n j¼ RV d kxðjþ k P n j¼ kxðjþ k X; ¼ P n j¼ kxðjþ k The data sets mentioned in Table are high-dimensional data coming from various domains. The first two data sets are the near-infrared sectra of ale and meat,

11 FRANÇOIS ET AL. THE CONCENTRATION OF FRACTIONAL DISTANCES 883 TABLE Relative Variances for Several Real Data Sets Comared with the Relative Variances of Artificial i.i.d. Data Sets The numbers between arentheses are the number of variables used to build X s. resectively, the next two data sets are recorded sounds of yes and no, and of boat and goat [46]. The ionoshere and the musk data sets come from the UCI machine learning reository. 3 The dimensionality goes from 34 to more than 8,000. As exected, it is seen in Table that the relative variance of the original data set (column ) is very similar to the relative variance of a subset of its variables (column 3). In contrast, the relative variance for the crafted data set 0 (column ) is very similar to the relative variance of a random uniformly distributed data set of the same dimension (column 4). tu 6 ABOUT THE OPTIMAL VALUE OF Throughout the receding sections, we saw that all fractional distances concentrate in high-dimensional saces and that concentration deends nontrivially on the value of. This concentration has negative consequences on indexing methods and leads to the questioning of the meaningfulness of the NN search when the distances seem to be all identical. This motivates the use of alternative metrics to the widely used euclidean one. 6. Fractional Norms and Non-Gaussian Noise One more argument can motivate the use of fractional norms. It can easily be shown that the concet of euclidean distance and the concet of Gaussian white noise are intimately tied. A Gaussian white noise is a noise that has a normal distribution with zero mean and equal variance for all variables. Looking for the most similar data element to a query oint by minimizing the euclidean distance is equivalent to choosing the NN to be the oint most likely to be the query oint under the Gaussian white noise scheme. In low-dimensional saces, most often, a Gaussian white noise is an accetable assumtion. However, when the dimension increases, other noise schemes might be more aroriate. The white Gaussian noise is a model that describes small alterations gently distributed over the coordinates. Obviously, in low-dimensional saces, this is the only kind of noise we can coe with. Imagine now a noise scheme that models large alterations of only some of the coordinates instead of small alterations of all coordinates.. Available at htt// Available at htt//nathalie.vialaneix.free.fr/maths/article.h3?id _article=0. 3. Available at htt// This kind of noise will generate the so-called outliers in low-dimensional saces, but in higher dimensions, where more information is available for each data element, it can just be handled as a noise scheme. Examles of such noise schemes are the so-called Salt and Peer noise on images and Burst noise on time series; encoding errors may also be viewed as noise that sometimes dramatically affects a small number of the coordinates. In [47], a colored noise scheme is studied, which concentrates its effects on some of the coordinates while leaving the other ones nearly unchanged. The exeriments on high-dimensional data show that, for such a noise, fractional norms are better at identifying the real NNs, that is, the original oint when the noise is removed. For Gaussian white noise, however, the euclidean norm gives better results. Similarly, in [4], the exeriments show that fractional norms are better suited for classification when masking noise is alied. This noise scheme randomly changes some values of the coordinates; it is very different from Gaussian noise. All these results clearly illustrate the fact that the otimal value of is highly alication deendent. 6. Choosing the Norm for Regression/ Classification In a rediction or classification roblem, the value of could be chosen so as to get the best model erformances, according to the exected error in redicting the resonse value or the class label. It would thus be considered as an additional arameter to the model The norm that is chosen is the norm that minimizes the differences between the true resonse values or class labels and the redicted ones. However, this would necessitate multilying the comutation times by as many different values of are tested. An elegant alternative is to choose the value rior to the building of the model. In this case, the norm is chosen before any rediction model is constructed. It is chosen according to a statistical measure of relevance for each -norm that is considered. This statistical measure gives each -norm a score based on how well similar data elements according to the -norm relate to similar resonse values or class labels. If the data elements that are close in the data sace, that is, similar according to the norm, are also close in terms of associated resonse value, then the norm is considered relevant as a measure of similarity for those data. Nonarametric noise estimators such as the Gamma test [48] or the Differogram [49] are suitable for this, as well as the erformances of a -NN model. This latter strategy (-NN model) is illustrated in the following exeriments. We consider real data, namely, the Housing 4 data set, and the Wine, Tecator, and Ale 5 data sets. The objective in the Housing data set is to redict the values of houses (in thousand dollars) described by 3 attributes reresenting the demograhic statistics of the area around each house. The data set contains 506 instances slit into 338 learning examles and 69 for testing. The 4. Available at the UCI reository. 5. All of them are available at htt//

12 884 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 7, JULY 007 TABLE Performances of the RBFN on Several Data Sets with the Euclidean Norm and with Fractional Norms The data are altered with burst noise. The norm in the RBFN is chosen according to the leave-one-out erformances of a -NN. other three data sets are sectral data for the Wine, the alcohol content must be redicted, for the Tecator, it is the fat content, and for the Ale data set, it is the sugar content. A burst noise was added to the data Some coordinates were altered with a multilicative noise of high amlitude. With such a noise scheme, strongly non- Gaussian fractional norms are better suited to measure the similarity. The leave-one-out error of a -NN is used to measure the relevance of each norm and to choose the most relevant one. Then, a Radial Basis Function Network (RBFN) [50] is built with the chosen norm. The arameters of the RBFN are chosen by fourfold cross validation. Table reorts the Root Mean Square Error (RMSE) of rediction on an indeendent test set for all the data sets. For each of these data sets, the fractional norm, chosen rior to the building of the model, gives better results than the euclidean norm. 6.3 Choosing the Norm in Content-Based Similarity Search In many multimedia database systems, the user feedback can be used to estimate the relevance of the result of the search for similar elements. This can be translated into the relevance of the metric used to get similar elements. For instance, in image and text retrieval, the relevance measure is used to weigh the coordinates in the comutation of the distance to better reflect the erceived (subjective) similarity between objects [5], [5]. A relevance feedback algorithm is then given as follows Suose that all elements are described by a feature vector; given a query Q, the system retrieves the nearest or the -NNs according to several -norms (with ¼ 0; 05; ; ; 4, for instance). The user then identifies the most relevant results, and the score of the -norms corresonding to those results is increased. After several iterations, the -norm with the highest score is chosen. To illustrate this rocedure, the XMVTS database is used. This database is comrised of 600 hotograhs, altered with Salt and Peer noise (some ixels are set to black or white randomly), of 00 individuals (that is, three ictures er individual.) At each iteration, a icture from the data set is considered as a query oint. Its NN according to several -norms are retrieved, and the score of the norms for which the retrieved image corresonds to the same individual is increased by one. Fig. 5 shows the evolution of the score for each -norm at iterations to 0 and iteration 580 to 600. We can see that the /-norm is the one with the highest score after 0 iterations and still the one after all 600 iterations. The /4 and /8-norms also have high scores in contrast to higher order norms that erform oorly. The same exeriment was reeated 00 times with only 0 iterations. In 86 ercent of these exeriments, the /-norm was identified as the most relevant. The /8-norm was chosen as the most relevant three times, the /4-norms only once, the -norm 0 times, and the other (higher order) norms were never chosen. In conclusion, fractional norms should be used when they are a more relevant measure of similarity and, hence, increase the erformances, rather than because of concentration considerations. 7 CONCLUSIONS A comarison between data elements is often erformed using the euclidean norm and distance. In high-dimensional saces, however, norms concentrate. This means that all airwise distances in a data set are very similar and can Fig. 5. The evolution of the score of each norm as a function of the number of iterations for the face image database. The /-norm aears as the most relevant from iteration 0 onward.

13 FRANÇOIS ET AL. THE CONCENTRATION OF FRACTIONAL DISTANCES 885 lead to questioning the relevance of the euclidean distance for measuring similarities in high-dimensional saces. This aer considers the roblem of the concentration of the distances indeendently from the number of data and shows that the concentration is indeed an intrinsic roerty of the distances and not due to finite samle size. This aer furthermore shows that. all -norms concentrate, even when an infinite number of data (that is, a distribution) is considered, including when the distribution is unbounded;. the value of and the shae of the distribution of high-dimensional data influence the value of the relative variance, which is derived for any distribution and used as a measure of concentration;. consequently, the exonent of the norm can be adjusted to fight the concentration henomenon;. there exist distributions for which the relative contrast does not increase when the exonent of the norm decreases;. the identically distributed hyothesis in the concentration of the norm theorems is not necessary as soon as the variables are normalized;. the concentration henomenon is more related to the intrinsic dimension of data than to their embedding dimension, which makes its consequences in ractical situations less severe than mathematically exected; and. the otimal metric is highly alication-deendent, and some sort of suervision is needed to otimally choose the metric. Fractional norms are not always less concentrated than other norms. They seem, however, to be more relevant as a measure of similarity when the noise affecting the data is strongly non-gaussian. ACKNOWLEDGMENTS The work of Damien Francois is funded by a grant from the Belgian FRIA. Part of his work and of the work of Vincent Wertz is suorted by the Interuniversity Attraction Pole (IAP), initiated by the Belgian Federal State, Ministry of Sciences, Technologies and Culture. The scientific resonsibility rests with the authors. REFERENCES [] E. Chavez, G. Navarro, R.A. Baeza-Yates, and J.L. Marroquin, Searching in Metric Saces, ACM Comuting Surveys, vol. 33, no. 3,. 73-3, Set. 00. [] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press Professional, 990. [3] A.K. Jain, M.N. Murty, and P.J. Flynn, Data Clustering A Review, ACM Comuting Surveys, vol. 3, no. 3, , Set [4] P. Demartines, Analyse de Données ar Réseaux de Neurones Auto-Organisés, PhD dissertation, Institut Nat l Polytechnique de Grenoble, Grenoble, France, 994 (in French). [5] K.S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When Is Nearest Neighbor Meaningful, Proc. Seventh Int l Conf. Database Theory,. 7-35, Jan [6] Y. Tao, J. Sun, and D. Paadias, Analysis of Predictive Satio- Temoral Queries, ACM Trans. Database Systems, vol. 8, no. 4, , Dec [7] N. Katayama and S. Satoh, Distinctiveness-Sensitive Nearest- Neighbor Search for Efficient Similarity Retrieval of Multimedia Information, Proc. 7th Int l Conf. Data Eng. (ICDE 0), , Ar. 00. [8] N. Katayama and S. Satoh, Similarity Image Retrieval with Significance-Sensitive Nearest-Neighbor Search, Proc. Sixth Int l Worksho Multimedia Information Systems (MIS 00), , Oct [9] J. Yang, A. Patro, S. Huang, N. Mehta, M.O. Ward, and E.A. Rundensteiner, Value and Relation Dislay for Interactive Exloration of High Dimensional Datasets, Proc. IEEE Sym. Information Visualization (InfoVis 04), W.A. Burks, ed., , Oct [0] E. Tuncel, H. Ferhatosmanoglu, and K. Rose, Vq-Index An Index Structure for Similarity Searching in Multimedia Databases, Proc. 0th ACM Int l Conf. Multimedia, , Dec. 00. [] N. Mamoulis, D.W. Cheung, and W. Lian, Similarity Search in Sets and Categorical Data Using the Signature Tree, Proc. 9th Int l Conf. Data Eng. (ICDE 03), , Mar [] P. Ciaccia and M. Patella, Pac Nearest Neighbor Queries Aroximate and Controlled Search in High-Dimensional and Metric Saces, Proc. 6th Int l Conf. Data Eng. (ICDE 0), W.A. Burks, ed., , Mar [3] M. Demirbas and H. Ferhatosmanoglu, Peer-to-Peer Satial Queries in Sensor Networks, Proc. Third IEEE Int l Conf. Peer-to- Peer Comuting (PP 03),. 3-39, Set [4] D.S. Johnson, S. Krishnan, J. Chhugania, S. Kumar, and S. Venkatasubramanian, Comressing Large Boolean Matrices Using Reordering Techniques, Proc. 30th Int l Conf. Very Large Data Bases (VLDB 04),. 3-3, Set [5] K. Chakrabarti and S. Mehrotra, The Hybrid Tree An Index Structure for High Dimensional Feature Saces, Proc. 5th Int l Conf. Data Eng. (ICDE 99), , Feb [6] K. Chakrabarti and S. Mehrotra, Local Dimensionality Reduction A New Aroach to Indexing High Dimensional Saces, Proc. 6th Int l Conf. Very Large Data Bases (VLDB 00), , Set [7] T. Liu, A. Moore, A. Gray, and K. Yang, An Investigation of Practical Aroximate Nearest Neighbor Algorithms, Proc. 8th Ann. Conf. Neural Information Processing Systems (NIPS 04), , Dec [8] K.P. Bennett and U. Fayyad Dan Geiger, Density-Based Indexing for Aroximate Nearest-Neighbor Queries, Proc. Fifth ACM SIGKDD Int l Conf. Knowledge Discovery and Data Mining, , Aug [9] S. Berchtold, C. Böhm, H.V. Jagadish, H.-P. Kriegel, and J. Sander, Indeendent Quantization An Index Comression Technique for High-Dimensional Data Saces, Proc. 6th Int l Conf. Data Eng. (ICDE 00), , Mar [0] K.Y. Yi, D.W. Cheung, and M.K. Ng, A Highly-Usable Projected Clustering Algorithm for Gene Exression Profiles, Proc. Worksho Data Mining in Bioinformatics (BIOKDD 03),. 4-48, Aug [] M. Vlachos, D. Gunoulos, and G. Kollios, Robust Similarity Measures for Mobile Object Trajectories, Proc. 3th Int l Worksho Database and Exert Systems Alications (DEXA 0),. 7-78, Set. 00. [] F. Korn, B.-U. Pagel, and C. Faloutsos, On the Dimensionality Curse and the Self-Similarity Blessing, IEEE Trans. Knowledge and Data Eng., vol. 3, no.,. 96-, Jan./Feb. 00. [3] A. Hinneburg, C.C. Aggarwal, and D.A. Keim, What Is the Nearest Neighbor in High Dimensional Saces, Proc. 6th Int l Conf. Very Large Data Bases (VLDB 00), A. El Abbadi, M.L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, eds., , Set [4] C.C. Aggarwal, A. Hinneburg, and D.A. Keim, On the Surrising Behavior of Distance Metrics in High Dimensional Saces, Proc. Eighth Int l Conf. Database Theory, J. Van den Bussche and V. Vianu, eds., , Jan. 00 [5] C.C. Aggarwal, Re-Designing Distance Functions and Distance- Based Alications for High Dimensional Data, Proc. ACM Int l Conf. Management of Data (SIGMOD 0), vol. 30, no.,. 3-8, Mar. 00. [6] K. Doherty, R. Adams, and N. Davey, Non-Euclidean Norms and Data Normalisation, Proc. th Euroean Sym. Artificial Neural Networks, M. Verleysen, ed., Ar. 004.

14 886 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 7, JULY 007 [7] P. Howarth and S.M. Rüger, Fractional Distance Measures for Content-Based Image Retrieval, Proc. 7th Euroean Conf. Information Retrieval Research (ECIR 05), J.M. Fernandez-Luna and D.E. Losada, eds., , Mar. 005 [8] H. Jin, B.C. Ooi, H. Shen, C. Yu, and A. Zhou, An Adative and Efficient Dimensionality Reduction Algorithm for High-Dimensional Indexing, Proc. 9th Int l Conf. Data Eng. (ICDE 03), , Mar [9] C. Elkan, Using the Triangle Inequality to Accelerate K-Means, Proc. 0th Int l Conf. Machine Learning (ICML 03), , Aug [30] M. Skala, Measuring the Difficulty of Distance-Based Indexing, Proc. th Int l Conf. String Processing and Information Retrieval (SPIRE 05), M.P. Consens and G. Navarro, eds.,. 03-4, Nov [3] P. Yianilos, Excluded Middle Vantage Point Forests for Nearest Neighbor Search, technical reort, NEC Research Inst., Jan. 999, resented at Proc. Sixth Center for Discrete Mathematics and Theoretical Comuter Science (DIMACS) Imlementation Challenge Near Neighbor Searches Worksho. [3] S. Berchtold, C. Boehm, B. Braunmueller, D.A. Keim, and H.-P. Kriegel, Fast Similarity Search in Multimedia Databases, Proc. ACM Int l Conf. Management of Data (SIGMOD 97), J. Peckham, ed., May 997. [33] N. Katayama and S. Satoh, The Sr-Tree An Index Structure for High-Dimensional Nearest Neighbor Queries, Proc. ACM Int l Conf. Management of Data (SIGMOD 97), J. Peckham, ed., , May 997. [34] S. Berchtold, D.A. Keim, and H.-P. Kriegel, The X-Tree An Index Structure for High-Dimensional Data, Proc. nd Int l Conf. Very Large Data Bases (VLDB 96), T.M. Vijayaraman, A.P. Buchmann, C. Mohan, and N.L. Sarda, eds.,. 8-39, Set [35] K.-I. Lin, H.V. Jagadish, and C. Faloutsos, The TV-Tree An Index Structure for High-Dimensional Data, VLDB J., vol. 3, no. 4, , 994. [36] P. Indyk and R. Motwani, Aroximate Nearest Neighbors Towards Removing the Curse of Dimensionality, Proc. 30th Ann. ACM Sym. Theory of Comuting, , May 998. [37] C. Böhm, S. Berchtold, and D.A. Keim, Searching in High- Dimensional Saces Index Structures for Imroving the Performance of Multimedia Databases, ACM Comuting Surveys, vol. 33, no. 3, , Set. 00. [38] S. Brin, Near Neighbor Search in Large Metric Saces, Proc. st Int l Conf. Very Large Data Bases (VLDB 95), U. Dayal, P.M.D. Gray, and S. Nishio, eds., , Set [39] S. Berchtold, C. Bohm, D.A. Keim, and H.-P. Kriegel, A Cost Model for Nearest Neighbor Search in High-Dimensional Data Sace, Proc. 6th ACM SIGACT-SIGMOD-SIGART Sym. Princiles of Database Systems (PODS 97), , May 997. [40] R. Weber, H.-J. Schek, and S. Blott, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High- Dimensional Saces, Proc. 4th Int l Conf. Very Large Data Bases (VLDB 98), A. Guta, O. Shmueli, and J. Widom, eds., , Aug [4] R. Bellmann, Adative Control Processes A Guided Tour. Princeton Univ. Press, 96. [4] J.B. Tenenbaum, V. de Silva, and J.C. Langford, A Global Geometric Framework for Nonlinear Dimensionality Reduction, Science, vol. 90, no. 5500, , Dec [43] J.A. Lee, A. Lendasse, and M. Verleysen, Nonlinear Projection with Curvilinear Distances Isoma versus Curvilinear Distance Analysis, Neurocomuting, vol. 57, , 003. [44] C.L Blake, D.J. Newman, S. Hettich, and C.J. Merz, UCI Reository of Machine Learning Databases, htt// edu/~mlearn/mlreository.html, 998. [45] D. Landers and L. Rogge, Laws of Large Numbers for Pairwise Indeendent Uniformly Integrable Random Variables, Math. Nachrichten, vol. 30,. 89-9, 987. [46] G. Biau, F. Bunea, and M.H. Wegkam, Functional Classification in Hilbert Saces, IEEE Trans. Information Theory, vol. 5, no. 6,. 63-7, 005. [47] D. Francois, V. Wertz, and M. Verleysen, Non-Euclidean Metrics for Similarity Search in Noisy Datasets, Proc. Euroean Sym. Artificial Neural Networks (ESANN 05), , 005. [48] A.J. Jone, A. Stefansson, and N. Koncar, A Note on the Gamma Test, Neural Comuting and Alications, vol. 5, no. 3,. 3-33, Set [49] K. Pelckmans, J. De Brabanter, J.A.K. Suykens, and B. De Moor, The Differogram Nonarametric Noise Variance Estimation and Its Use for Model Selection, Neurocomuting, vol. 69, no.,. 00-, Dec [50] M. Powell, Radial Basis Functions for Multivariable Interolation A Review, Algorithms for Aroximation, M.G. Cox and J.C. Mason, eds., , Clarendon Press, 987 [5] Y. Rui, T.S. Huang, M. Ortega, and S. Mehrotra, Relevance Feedback A Power Tool in Interactive Content-Based Image Retrieval, IEEE Trans. Circuits and Systems for Video Technology, secial issue on segmentation, descrition, and retrieval of video content, vol. 8, no. 5, , Set [5] G. Salton and C. Buckley, Mroving Retrieval Performance by Relevance Feedback, J. Am. Soc. for Information Science, vol. 4, , June 990. Damien François received the MSc degree in comuter science and the PhD degree in alied mathematics from the Université catholique de Louvain, Belgium, in 00 and 007, resectively. He is now a research assistant at the Center for System Engineering and Alied Mechanics (CESAME) and a member of the Machine Learning Grou at the Université catholique de Louvain. His main interests include high-dimensional data analysis, distance-based rediction models, and feature/model selection. Vincent Wertz received the engineering degree in alied mathematics in 978 and the PhD degree in control engineering in 98 from the Université catholique de Louvain, Louvain-la- Neuve, Belgium. He is now a rofessor at the Université catholique de Louvain after having held a ermanent osition with the Belgian National Fund for the Scientific Research (NFSR). His main research interests are in the fields of identification, redictive control, fuzzy control, and industrial alications. Lately, he has also been involved in a major edagogical reform of the first and second year teaching rogram in the School of Engineering. He is a member of the IEEE. Michel Verleysen received the MS and PhD degrees in electrical engineering from the Université catholique de Louvain, Belgium, in 987 and 99, resectively. He was an invited rofessor at the Swiss Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, in 99, at the Université d Evry Val d Essonne, France, in 00, and at the Université Paris I- Panthéon-Sorbonne in 00, 003, and 004, resectively. He is now a research director at the Belgian Funds National de la Recherche Scientique (FNRS) and lecturer at the Université catholique de Louvain. He is the editor-in-chief of Neural Processing Letters, the chairman of the Annual Euroean Symosium on Artificial Neural Networks (ESANN), an associate editor of the IEEE Transactions on Neural Networks, and a member of the editorial board and rogram committee of several journals and conferences on neural networks and learning. He is an author or coauthor of about 00 scientific aers in international journals and books or communications to conferences with reviewing committees. He is the coauthor of the scientific oularization book on artificial neural networks in the series Que Sais-Je? in French. His research interests include machine learning, artificial neural networks, self-organization, time-series forecasting, nonlinear statistics, adative signal rocessing, and high-dimensional data analysis. He is a senior member of the IEEE.. For more information on this or any other comuting toic, lease visit our Digital Library at

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection

More information

Uniform Law on the Unit Sphere of a Banach Space

Uniform Law on the Unit Sphere of a Banach Space Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Elementary theory of L p spaces

Elementary theory of L p spaces CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

Solution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0

Solution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0 Advanced Finite Elements MA5337 - WS7/8 Solution sheet This exercise sheets deals with B-slines and NURBS, which are the basis of isogeometric analysis as they will later relace the olynomial ansatz-functions

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Topic 7: Using identity types

Topic 7: Using identity types Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Brownian Motion and Random Prime Factorization

Brownian Motion and Random Prime Factorization Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

CSC165H, Mathematical expression and reasoning for computer science week 12

CSC165H, Mathematical expression and reasoning for computer science week 12 CSC165H, Mathematical exression and reasoning for comuter science week 1 nd December 005 Gary Baumgartner and Danny Hea hea@cs.toronto.edu SF4306A 416-978-5899 htt//www.cs.toronto.edu/~hea/165/s005/index.shtml

More information

arxiv:cond-mat/ v2 25 Sep 2002

arxiv:cond-mat/ v2 25 Sep 2002 Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,

More information

1 Probability Spaces and Random Variables

1 Probability Spaces and Random Variables 1 Probability Saces and Random Variables 1.1 Probability saces Ω: samle sace consisting of elementary events (or samle oints). F : the set of events P: robability 1.2 Kolmogorov s axioms Definition 1.2.1

More information

MATH 361: NUMBER THEORY EIGHTH LECTURE

MATH 361: NUMBER THEORY EIGHTH LECTURE MATH 361: NUMBER THEORY EIGHTH LECTURE 1. Quadratic Recirocity: Introduction Quadratic recirocity is the first result of modern number theory. Lagrange conjectured it in the late 1700 s, but it was first

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Plotting the Wilson distribution

Plotting the Wilson distribution , Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

On the Toppling of a Sand Pile

On the Toppling of a Sand Pile Discrete Mathematics and Theoretical Comuter Science Proceedings AA (DM-CCG), 2001, 275 286 On the Toling of a Sand Pile Jean-Christohe Novelli 1 and Dominique Rossin 2 1 CNRS, LIFL, Bâtiment M3, Université

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

Generalized optimal sub-pattern assignment metric

Generalized optimal sub-pattern assignment metric Generalized otimal sub-attern assignment metric Abu Sajana Rahmathullah, Ángel F García-Fernández, Lennart Svensson arxiv:6005585v7 [cssy] 2 Se 208 Abstract This aer resents the generalized otimal subattern

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS #A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

2. Sample representativeness. That means some type of probability/random sampling.

2. Sample representativeness. That means some type of probability/random sampling. 1 Neuendorf Cluster Analysis Assumes: 1. Actually, any level of measurement (nominal, ordinal, interval/ratio) is accetable for certain tyes of clustering. The tyical methods, though, require metric (I/R)

More information

7.2 Inference for comparing means of two populations where the samples are independent

7.2 Inference for comparing means of two populations where the samples are independent Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Lilian Markenzon 1, Nair Maria Maia de Abreu 2* and Luciana Lee 3

Lilian Markenzon 1, Nair Maria Maia de Abreu 2* and Luciana Lee 3 Pesquisa Oeracional (2013) 33(1): 123-132 2013 Brazilian Oerations Research Society Printed version ISSN 0101-7438 / Online version ISSN 1678-5142 www.scielo.br/oe SOME RESULTS ABOUT THE CONNECTIVITY OF

More information

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't

More information

Some Unitary Space Time Codes From Sphere Packing Theory With Optimal Diversity Product of Code Size

Some Unitary Space Time Codes From Sphere Packing Theory With Optimal Diversity Product of Code Size IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 5, NO., DECEMBER 4 336 Some Unitary Sace Time Codes From Shere Packing Theory With Otimal Diversity Product of Code Size Haiquan Wang, Genyuan Wang, and Xiang-Gen

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668

More information

One-way ANOVA Inference for one-way ANOVA

One-way ANOVA Inference for one-way ANOVA One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analysis of Variance and Design of Exeriment-I MODULE II LECTURE -4 GENERAL LINEAR HPOTHESIS AND ANALSIS OF VARIANCE Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur

More information

p-adic Measures and Bernoulli Numbers

p-adic Measures and Bernoulli Numbers -Adic Measures and Bernoulli Numbers Adam Bowers Introduction The constants B k in the Taylor series exansion t e t = t k B k k! k=0 are known as the Bernoulli numbers. The first few are,, 6, 0, 30, 0,

More information

Maxisets for μ-thresholding rules

Maxisets for μ-thresholding rules Test 008 7: 33 349 DOI 0.007/s749-006-0035-5 ORIGINAL PAPER Maxisets for μ-thresholding rules Florent Autin Received: 3 January 005 / Acceted: 8 June 006 / Published online: March 007 Sociedad de Estadística

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Session 5: Review of Classical Astrodynamics

Session 5: Review of Classical Astrodynamics Session 5: Review of Classical Astrodynamics In revious lectures we described in detail the rocess to find the otimal secific imulse for a articular situation. Among the mission requirements that serve

More information

Positive decomposition of transfer functions with multiple poles

Positive decomposition of transfer functions with multiple poles Positive decomosition of transfer functions with multile oles Béla Nagy 1, Máté Matolcsi 2, and Márta Szilvási 1 Deartment of Analysis, Technical University of Budaest (BME), H-1111, Budaest, Egry J. u.

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

Applications to stochastic PDE

Applications to stochastic PDE 15 Alications to stochastic PE In this final lecture we resent some alications of the theory develoed in this course to stochastic artial differential equations. We concentrate on two secific examles:

More information

On the Field of a Stationary Charged Spherical Source

On the Field of a Stationary Charged Spherical Source Volume PRORESS IN PHYSICS Aril, 009 On the Field of a Stationary Charged Sherical Source Nikias Stavroulakis Solomou 35, 533 Chalandri, reece E-mail: nikias.stavroulakis@yahoo.fr The equations of gravitation

More information

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA)

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA) Unsuervised Hyersectral Image Analysis Using Indeendent Comonent Analysis (ICA) Shao-Shan Chiang Chein-I Chang Irving W. Ginsberg Remote Sensing Signal and Image Processing Laboratory Deartment of Comuter

More information

Elliptic Curves and Cryptography

Elliptic Curves and Cryptography Ellitic Curves and Crytograhy Background in Ellitic Curves We'll now turn to the fascinating theory of ellitic curves. For simlicity, we'll restrict our discussion to ellitic curves over Z, where is a

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

1 Gambler s Ruin Problem

1 Gambler s Ruin Problem Coyright c 2017 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

Introduction to Probability and Statistics

Introduction to Probability and Statistics Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based

More information

Real Analysis 1 Fall Homework 3. a n.

Real Analysis 1 Fall Homework 3. a n. eal Analysis Fall 06 Homework 3. Let and consider the measure sace N, P, µ, where µ is counting measure. That is, if N, then µ equals the number of elements in if is finite; µ = otherwise. One usually

More information

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS CASEY BRUCK 1. Abstract The goal of this aer is to rovide a concise way for undergraduate mathematics students to learn about how rime numbers behave

More information

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES OHAD GILADI AND ASSAF NAOR Abstract. It is shown that if (, ) is a Banach sace with Rademacher tye 1 then for every n N there exists

More information

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION Journal of Statistics: Advances in heory and Alications Volume 8, Number, 07, Pages -44 Available at htt://scientificadvances.co.in DOI: htt://dx.doi.org/0.864/jsata_700868 MULIVARIAE SAISICAL PROCESS

More information

Section 0.10: Complex Numbers from Precalculus Prerequisites a.k.a. Chapter 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative

Section 0.10: Complex Numbers from Precalculus Prerequisites a.k.a. Chapter 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative Section 0.0: Comlex Numbers from Precalculus Prerequisites a.k.a. Chater 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative Commons Attribution-NonCommercial-ShareAlike.0 license.

More information

On the Chvatál-Complexity of Knapsack Problems

On the Chvatál-Complexity of Knapsack Problems R u t c o r Research R e o r t On the Chvatál-Comlexity of Knasack Problems Gergely Kovács a Béla Vizvári b RRR 5-08, October 008 RUTCOR Rutgers Center for Oerations Research Rutgers University 640 Bartholomew

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

Sets of Real Numbers

Sets of Real Numbers Chater 4 Sets of Real Numbers 4. The Integers Z and their Proerties In our revious discussions about sets and functions the set of integers Z served as a key examle. Its ubiquitousness comes from the fact

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

SYMPLECTIC STRUCTURES: AT THE INTERFACE OF ANALYSIS, GEOMETRY, AND TOPOLOGY

SYMPLECTIC STRUCTURES: AT THE INTERFACE OF ANALYSIS, GEOMETRY, AND TOPOLOGY SYMPLECTIC STRUCTURES: AT THE INTERFACE OF ANALYSIS, GEOMETRY, AND TOPOLOGY FEDERICA PASQUOTTO 1. Descrition of the roosed research 1.1. Introduction. Symlectic structures made their first aearance in

More information

3.4 Design Methods for Fractional Delay Allpass Filters

3.4 Design Methods for Fractional Delay Allpass Filters Chater 3. Fractional Delay Filters 15 3.4 Design Methods for Fractional Delay Allass Filters Above we have studied the design of FIR filters for fractional delay aroximation. ow we show how recursive or

More information

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION O P E R A T I O N S R E S E A R C H A N D D E C I S I O N S No. 27 DOI:.5277/ord73 Nasrullah KHAN Muhammad ASLAM 2 Kyung-Jun KIM 3 Chi-Hyuck JUN 4 A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST

More information

THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT

THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT ZANE LI Let e(z) := e 2πiz and for g : [0, ] C and J [0, ], define the extension oerator E J g(x) := g(t)e(tx + t 2 x 2 ) dt. J For a ositive weight ν

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems Int. J. Oen Problems Comt. Math., Vol. 3, No. 2, June 2010 ISSN 1998-6262; Coyright c ICSRS Publication, 2010 www.i-csrs.org Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

Correspondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R.

Correspondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R. 1 Corresondence Between Fractal-Wavelet Transforms and Iterated Function Systems With Grey Level Mas F. Mendivil and E.R. Vrscay Deartment of Alied Mathematics Faculty of Mathematics University of Waterloo

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

DIFFERENTIAL GEOMETRY. LECTURES 9-10,

DIFFERENTIAL GEOMETRY. LECTURES 9-10, DIFFERENTIAL GEOMETRY. LECTURES 9-10, 23-26.06.08 Let us rovide some more details to the definintion of the de Rham differential. Let V, W be two vector bundles and assume we want to define an oerator

More information

Haar type and Carleson Constants

Haar type and Carleson Constants ariv:0902.955v [math.fa] Feb 2009 Haar tye and Carleson Constants Stefan Geiss October 30, 208 Abstract Paul F.. Müller For a collection E of dyadic intervals, a Banach sace, and,2] we assume the uer l

More information

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Dedicated to Luis Caffarelli for his ucoming 60 th birthday Matteo Bonforte a, b and Juan Luis Vázquez a, c Abstract

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Multiplicative group law on the folium of Descartes

Multiplicative group law on the folium of Descartes Multilicative grou law on the folium of Descartes Steluţa Pricoie and Constantin Udrişte Abstract. The folium of Descartes is still studied and understood today. Not only did it rovide for the roof of

More information

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari Condence tubes for multile quantile lots via emirical likelihood John H.J. Einmahl Eindhoven University of Technology Ian W. McKeague Florida State University May 7, 998 Abstract The nonarametric emirical

More information

The Binomial Approach for Probability of Detection

The Binomial Approach for Probability of Detection Vol. No. (Mar 5) - The e-journal of Nondestructive Testing - ISSN 45-494 www.ndt.net/?id=7498 The Binomial Aroach for of Detection Carlos Correia Gruo Endalloy C.A. - Caracas - Venezuela www.endalloy.net

More information