On the Necessary and Sufficient Conditions of a Meaningful Distance Function for High Dimensional Data Space

Size: px
Start display at page:

Download "On the Necessary and Sufficient Conditions of a Meaningful Distance Function for High Dimensional Data Space"

Transcription

1 On the Necessary and Sufficient Conditions of a Meaningful Distance Function for High Diensional Data Space Chih-Ming Hsu Ming-Syan Chen Abstract The use of effective distance functions has been explored for any data ining probles including clustering, nearest neighbor search, and indexing. Recent research results show that if the earson variation of the distance distribution converges to zero with increasing diensionality, the distance function will becoe unstable or eaningless) in high diensional space even with the coonly used L p etric on the Euclidean space. This result has spawned any subsequent studies. We first coent that although the prior work provided the sufficient condition for the unstability of a distance function, the corresponding proof has soe defects. Also, the necessary condition for unstability i.e., the negation of the sufficient condition for the stability) of a distance function, which is required for function design, reains unknown. Consequently, we first provide in this paper a general proof for the sufficient condition of unstability. More iportantly, we go further to prove that the rapid degradation of earson variation for a distance distribution is in fact a necessary condition of the resulting unstability. With the result, we will then have the necessary and the sufficient conditions for unstability, which in turn iply the sufficient and necessary conditions for stability. This theoretical result derived leads to a powerful eans to design a eaningful distance function. Explicitly, in light of our results, we design in this paper a eaningful distance function, called Shrinkage-Divergence roxiity abbreviated as SD), based on a given distance function. It is epirically shown that the SD significantly outperfors prior easures for its being stable in high diensional data space and robust to noise, and is thus deeed ore suitable for distance-based clustering applications than the priorly used etric. 1 Introduction The curse of diensionality has recently been studied extensively on several data ining probles such as clustering, nearest neighbor search, and indexing. The curse of high diensionality is critical not only with re- Electrical Engineering Departent, National Taiwan University, Taipei, Taiwan. Eail: ing@arbor.ee.ntu.edu.tw. Electrical Engineering Departent, National Taiwan University, Taipei, Taiwan. Eail: schen@cc.ee.ntu.edu.tw. gards to the perforance issue but also to the quality issue. Specifically, on the quality issue, the design of effective distance functions has been deeed a very iportant and challenging issue. Recent research results showed that in high diensional space, the concept of distance or proxiity) ay not even be qualitatively [1][2][3][5][6][11]. Explicitly, the theore in [6] showed that under a broad set of conditions, in high diensional space, the distance to the nearest data point approaches the distance to the farthest data point of a given query point with increasing diensionality. For exaple, under the independent and identically distributed diensions assuption, the coonly used L p etrics will encounter probles in high diensionality. This theore has spawned any subsequent studies along the sae line [1][2][3][11][13]. The scenario is shown in Figure 1 where ɛ denotes a very sall nuber. Fro the query point, the ratio of the distance to the nearest neighbor to that to the farthest neighbor is alost 1. This phenoenon is called the unstable phenoenon [6] because there is poor discriination between the nearest and farthest neighbors for proxiity query. As such, the nearest neighbor proble becoes poorly defined. Moreover, ost indexing structures will have a rapid degradation with increasing diensionality which leads to an access to the entire database for any query [3]. Siilar issues are encountered by distance-based clustering algoriths and classification algoriths to odel the proxiity for grouping data points into eaningful subclasses. In this paper, a distance function which will result in this unstable phenoenon is referred to as a eaningless function in high diensional space, and is called eaningful otherwise. The result in [6] suggested that the design of a eaningful distance function in high diensional space is a very iportant proble and has significant ipact to a wide variety of data ining applications. In [6], it is proved that the sufficient condition of unstability is that the earson variation of the corresponding distance distribution degrades to with increasing diensionality. For exaple, if under the independent and identically distributed diensions assuption and the use an L p etric, the earson 12

2 εdmin DMAX 1 + ε )DMIN Figure 1: An exaple of unstable phenoenon. g f ax f, g ) in f, g) Figure 2: Exaple iniu and axiu of functions which are not continuous. variation of distance distribution rapidly degrades to with increasing diensionality in high diensional space, then the unstable phenoenon occurs. This distance function is hence called eaningless.) Note that in light of the equivalence between p q and q p, the negative of above sufficient condition for unstability is equivalent to the necessary condition of stability where we have a eaningful distance function). However, the sufficient condition for stability reains unknown. In fact, the iportant issue is how to design a eaningful distance or proxiity) function for high diensional data space. The authors in [1] provided soe practical desiderata for constructing a eaningful distance function, including 1) contrasting, 2) statistically sensitive, 3) skew agnification, and 4) copactness. These properties are in essence design guidelines for a eaningful distance function. However, we have no guarantee that a distance function which satisfies those needs will avoid the unstable phenoenon since these properties are not sufficient condition for stability). Consequently, neither the result in [6] nor that in [1] provides the necessary condition for unstability which is required for us to design a eaningful distance function in high diensional space. The design of a eaningful distance or proxiity) function hence reains as an open proble. This is the issue we shall solve in this paper. We first coent that although the work in [6] provided the sufficient condition for unstability, the corresponding proof has soe defects. [6] used the property that the iniu and axiu are continuous functions to deduce the sufficient condition for unstability, which, however, does not hold always. For exaple, consider the scenario of two functions f and g, as shown in Figure 2 where both the iniu and the axiu of functions f and g are discontinuous. In general, the iniu and axiu functions of rando sequence are not continuous, especially for discontinuous rando variables [8]. To reedy this, we will first provide in this paper a general proof for the sufficient condition of unstability. More iportantly, we go further to prove that the rapid degradation of earson variation for a distance distribution is a necessary condition of the resulting unstability. With the result, we will then have the necessary and sufficient conditions for unstability, whose negatives in turn iply the sufficient and necessary conditions for stability. Note that with the sufficient condition for stability which was unsolved in prior works and is first derived in this paper, one will then be able to design a eaningful distance function for high diensional space. Explicitly, this new result eans that a distance function for which the degradation of earson variation does not approach zero rapidly will be guaranteed to be eaningful i.e., stable) in high diensional space. The estiation of variation is a guideline for testing the distance function is unstable or not. As such, this theoretical analysis leads a powerful eans to design a eaningful distance function. Explicitly, in light our results, we design in this paper a eaningful distance function, called Shrinkage- Divergence roxiity abbreviated as SD), based on a given distance function. Specifically, the SD defines an adaptive proxiity function of two data points on individual attributes separately. The proxiity of two points is the aggregation of each attributive proxiity. SD agnifies the variation of the distance to detect and avoid the unstable phenoenon attribute by attribute. For each attribute of two data points, we will shrink the proxiity of this attribute to zero if the projected attribute of two data points falls into a sall interval. If they are ore siilar to each other than to others on an attribute, we are then not able to significantly discern aong the statistically. On the other hand, if soe projected attributes of two data points are apart fro one to another for a long original distance, then they are viewed dissiilar to each other. Therefore, we will be able to spread the 13

3 out to increase the degree of discriination. Note that since we define the proxiity between two data points separately on individual attributes, the noise effects of soe attributes will be itigated by other attributes. This accounts for the reason that SD is robust to noise in our experients. The contributions of this paper are twofold. First, as a theoretical foundation, we provided and proved the necessary and sufficient conditions of unstable phenoenon in high diensional space. Note that the negative of necessary condition of unstability is in essence the sufficient condition of stability, which provides an innovative and effective guideline for the design of a eaningful i.e., diensionality resistant) distance function in high diensional space. Second, in light of the theoretical results derived, we developed a new diensionality resistant proxiity function SD. It is epirically shown that the SD significantly outperfors prior easures for its being stable in high diensional data space and robust to noise, and is thus deeed ore suitable for distance-based clustering applications than the coonly used L p etric. The rest of the paper is organized as follows. Section 2 describes related works. Section 3 provides theoretical results for our work where the necessary and sufficient conditions for unstable phenoenon are derived. In Section 4, we devise a eaningful distance function SD. Experiental results are presented in Section 5. This paper concludes with Section 6. 2 Related Works The use of effective distance functions has been explored in several data ining probles, including nearest neighbor search, indexing, and so on. As described earlier, the work in [6] showed that under a broad set of conditions the neighbor queries becoe unstable in high diensional spaces. That is, fro a given query point, the distance to the nearest data point will approach that to the farthest data point in high diensional space. For exaple, under the coonly used assuption that each diension is independent, the L p etric will be unstable for any high diensional data spaces. For constant p p 1), the L p etric for two -diensional data points x = x 1, x 2,, x ) and y = y 1, y 2,, y ) is defined as L p x, y ) = i=1 x i y i p ) 1/p. The result in [6] has spawned any subsequent studies along this direction. [11] and [2] specifically exained the behavior of L p etric and showed that the proble of stability i.e., eaningfulness) in high diensionality is sensitive to the value of p. A property of L p presented in [11] is that the value of extreal difference Dax Din grows as 1/p 1/2 with increasing diensionality, where Dax and Din are the distances to the farthest point and that to the nearest point fro the origin, respectively. As a result, the L 1 etric is the only etric of the L p faily for which the absolute difference between nearest and farthest neighbor increases with the diensionality. For the etric, Dax Din converges to a constant, and for distance etrics L k for k 3, Dax Din converges to zero with diensionality. This eans that the L 1 etric is ore preferable than the for high diensional data ining applications. In [2], the authors also extended the notion of a L p etric to a fractional distance function where a fractional distance function dist l for l, 1) is defined as: ) 1/l dist l x, y ) = x i y i ) l. i=1 In [3], the authors proposed the IGrid-index which is a ethod for siilarity indexing. The IGrid-index used grid-based approach to redesign the siilarity function fro L p. In order to perfor the proxiity thresholds, the IGrid-index ethod discretize the data space into k d equidepth ranges. Specifically, R [i, j] denotes the jth range for diension i. For diension i, if both x i and y i belong to the sae range R [i, j], then the two data points are said to be in proxiity on diension i. Let S [ x, y, k d ] be the proxiity set for two data points x and y with a given level of discretization k d, then the siilarity between x and y is given by: IDist x, y, k d ) = i S[ x, y,k d] 1/p 1 x ) p i y i, i n i where i and n i are the upper and lower bounds for the corresponding range in the diension i in which the data points x and y are in proxiity to one another. Note that these results while being valuable fro various perspectives do not provide the sufficient condition for a eaningful distance function that can be used for the distance function design in high diensional data space. 3 On a Meaningful Distance Function In this section, we shall derive theoretical properties for a eaningful distance function. reliinaries are given in Section 3.1. In Section 3.2, we first provide a new proof for the sufficient condition of unstability in Theore 1 since, as pointed out earlier, the proof in [6] has soe defects). Then, we derived the necessary condition for unstability i.e., the negative of the sufficient condition for stability) in Theore 2. We state the coplete 14

4 results the necessary and sufficient conditions for unstability) in Theore 3. For better readability, we put the proofs of all theores in Section 3.3. Rearks on the valid indices for a distance function to be eaningful are ade in Section Definitions We now introduce several ters to facilitate our presentation. Assue that d is a real-value distance function 1 defined on a certain -diensional space. d is well defined as increases. For exaple, the L p etric defined on the -diensional Euclidean space is well defined as increases for any p in, ). Let,i i = 1, 2,, N be N independent data points which are sapled fro soe -variate distribution F. F is also well defined on soe saple space as increases. Note that we do not assue that the attributes of data space are independent. Essentially, any of the attributes are correlated with one another [1].) If Q is an arbitrary diensional) query point chosen independently fro all,i. Let = ax{d,i, Q ) 1 i N} and DMIN = in{d,i, Q ) 1 i N}. Hence and DMIN are rando variables for any. Definition 3.1. A faily of well defined distance functions {d = 1, 2, } is δ-unstable or δ- eaningless) if δ = sup{δ li { 1 + ɛ)dmin } δ for any ɛ > }. For ease of exposition, we also refer to 1-unstable as unstable or with a eaningless distance function). As δ approaches 1, a sall relative change in any query point in a direction away fro the nearest neighbor could change the point into the farthest neighbor. For the purpose of this study, we shall explore the extreal case: unstable phenoenon i.e., δ = 1). A distance function is called stable if it is not unstable i.e., δ < 1). A list of sybols used is given in Table Theoretical roperties of Unstability Fro the probability theory, the unstable phenoenon is equivalent to the case that DMAX DMIN converges in probability to one as increases. The work in [6] proved that under the ) condition that earson variation d var,1,q ) E[d,1,Q )] of distance distribution converges to with increasing diensionality, the extreal ratio DMIN will also converge to one with increasing diensionality. Forally, we have the following theore. 1 In this paper, the distance function need not be a etric. A nonnegative function d : X X R is a etric for data space X if it satisfies the following properties: 1) dx, y) x, y X, 2) dx, y) = if and only if x = y, 3) dx, y) = dy, x) x, y X, and 4) dx, z) + dz, y) dx, y) x, y, z X Notation Table 1: List of Sybols Definition Diensionality of the data space N Nuber of data points {e} robability of an event e X, Y, X, Y Rando variables defined on soe probability space E[X] Expectation of a rando variable X varx) Variance of a rando variable X iid Independent and identically distribution d A distance function of a - diensional data space = 1, 2, x F x is a rando saple point fro the distribution F A sequence of rando variables X 1, X n X X2 converges in probability to a rando variable X if ɛ li n { X n X ɛ} = 1 Recall that this theore was rendered in [6] and we ainly provide a correct and ore general proof in this paper. Theore 3.1. Sufficient condition of unstability [6]) Let p be a constant < p < ). If li var d,1,q)p E[d,1,Q ) p ]) =, then for every ɛ > li { 1 + ɛ)dmin } = 1. Fro Theore 3.1, a distance function is unstable in high diensional space if its earson variation of distance distribution approaches with increasing diensionality. This phenoenon leads poor discriination between the nearest and farthest neighbor for proxiity query in high diensional space. Note that as entioned in [6], the condition of Theore 3.1 is applicable to a variety of data ining applications. Exaple 1. Suppose that we have the data whose distribution and query are iid fro soe distribution with finite fourth oents in all diensions [6]. If,ij and Q j are, respectively, the jth attribute of ith data point and the jth attribute of the query point. Hence,,ij Q j ) 2, j = 1, 2,, are iid with soe expectation µ and soe nonnegative variance σ 2 that are the sae regardless of the values of i and. If we use the etric for proxiity query, then d,i, Q ) = j=1,i j Q j ) 2 ) 1/2. Under the iid assuptions, we have E[d,i, Q ) 2 ] = µ and var d,i, Q ) 2) = σ 2. Therefore, the earson 15

5 variation var d,1,q)2 E[d,1,Q ) 2 ] ) = σ2 /µ 2 converges to with increasing diensionality. Hence, the corresponding is eaningless under such a data space. In general, if the data distribution and query are iid fro soe distribution with finite 2pth oents in all diensions, the L p etric is eaningless for all 1 p < [6]. We next prove in Theore 3.2 that the condition for the earson variation of the distance distribution to any given target to converge to with increasing diensionality is not only the sufficient condition as stated in Theore 3.1) but also the necessary condition of unstability of a distance function in high diensional space. Theore 3.2. Necessary condition of unstability) If li { 1 + ɛ)dmin } = 1 for every ɛ >, then li var d,1, Q ) p E[d,1, Q ) p ) = < p < ). ] With Theore 3.2, we will then have the necessary and the sufficient conditions for unstability, whose negatives in turn iply the sufficient and necessary conditions for stability. The stateent that the earson variation of the distance distribution to any given target should converge to with increasing diensionality is equivalent to the unstable phenoenon. Following Theores 3.1 and 3.2, we reach Theore 3.3 below which provides a theoretical guideline for designing diensionality resistant distance functions. Theore 3.3. Main Theore) Let p be a constant < p < ). For every ɛ >, li { 1 + ɛ)dmin } = 1 if and only if li var d,1, Q ) p E[d,1, Q ) p ] ) =. Exaple 2. Theore 3.3 shows that we have to increase the variation of distance distribution for redesigning a diensionality resistant distance function. Assue that the data points and query are iid fro soe distribution in all diension, and E[d,i, Q ) p ] = µ, var D,1, Q ) p ) = σ 2 for soe constant µ and σ> ), as in Exaple 1. Since the exponential function e x for x ) is strictly convex, hence the Jensen s inequality [4] [15] suggests that the variation of distance distribution can be agnified by applying the exponential function. Let fx) = e x for x. We obtain soe interesting results by applying f to soe well-known distance distributions, as shown in Table 2. The above results show that the transforations of distance functions by the exponential function can reedy the eaningless behaviors on soe high diensional data spaces, especially for those cases that distance distributions have long tails. However, in such case, the stable property of distance functions ay not be useful for application needs. In addition, the large variation akes data sparsely, and then the concept of proxiity will also be difficult to visualize. Fro our experiental results, it is shown that the earson variations, whose distance functions are translated fro L 1 etric or etric by exponential function f, will start to diverge even when encountering only 1 diensions on any data spaces. We ake soe rearks on the valid indices for a distance function to be eaningful in Section roof of Main Theore In order to prove our ain theore, we need to drive soe properties of unstable phenoenon. For interest of space, those properties and soe iportant theores of probability are presented in Appendix A. roof for Theore 3.1: d Let V,i =,i,q ) p E[d,i,Q ) p ] and X = V,1, V,2,, V,N ). Hence, V,i i = 1, 2,, N are non-negative iid rando variables for all and V,i 1 as. Since for any ɛ >, li { ax X ) 1 ɛ} = li {ax X ) 1 + ɛ)} + li { ax X ) 1 ɛ)} = li 1 {V,j < 1 + ɛ) j = 1, 2,, N}) + li {V,j 1 ɛ) j = 1, 2,, N} by Theore A.4) = li 1 ) N j=1 {V,j < 1 + ɛ)} + li N j=1 {V,j 1 ɛ)} by V,j j = 1, 2,, N are iid and Theore A.4) = by V,j 1 as ), and li { in X ) 1 ɛ} = li {in X ) 1 + ɛ)} + li { in X ) 1 ɛ)} = li {V,j 1 + ɛ) j = 1, 2,, N} +li 1 {V,j > 1 ɛ) j = 1, 2,, N}) by Theore A.4) N = li j=1 {V,j 1 + ɛ)} + li 1 ) N j=1 {V,j > 1 ɛ)} by V,j j = 1, 2,, N are iid and Theore A.4) 16

6 Table 2: The earson variation of soe translated distance functions. Distance distribution Binoial Unifor Noral Gaa Exponential li var fd,1,q)p ) E[fd,1,Q ) p )] ) = by V,j 1 as ), then ax X ) and in X ) converge in probability to 1 as. Further, by Slutsky s theore, proposition 2 of Theore A.1, and E[d,i, Q ) p ]ax ) 1/p X ) = DMIN E[d,i, Q ) p ]in X ) ax ) 1/p X ) = in, X ) then DMAX DMIN 1 as. Q.E.D. roof for Theore 3.2: Let W,i = d,i, Q ) p. By third property of Lea A.1 and the second property of Theore A.1, we have W,i W,j 1 for any i, j. Hence, an application of Theore A.3 and the forth property of Lea A.1, we have [ ] W,i W,i var = E var W,j W,j W,j [ ]) W,i + var E W,j W,j and var hence li var W,i respectively. Furtherore, since E [ ]) W,i E W,j W,j 3.1) li E W,j ) =, [ ] W,i var W,j W,j and are nonnegative for all, [ ] W,i var W,j = W,j and [ ]) 3.2) li var W,i E W,j =. W,j W,i Also, var W,j W,j = x for all x and for all, therefore Equation 3.1) iplies) that the probability of W,i this set {x var W,j W,j = x > } ust be or W,i var W,j = x = for all x, W,j as. Set x = E[d,i, Q ) p ], hence we have Then li var d,i, Q ) p E[d,i, Q ) p ) = for any i. ] li var d,1, Q ) p E[d,1, Q ) p ] ) =. Note that W,i i = 1, 2, are iid for all.) Q.E.D. The ain theore, i.e., Theore 3.3, thus follows fro Theore 3.1 and Rearks on Indices to Test Meaningful Functions Many researchers used the extreal ratio DMIN [6] or the relative contrast DMAX DMIN [2] to test the stable phenoenon of distance functions. However, as shown by our experiental results, these indices are sensitive to outliers and found inconsistent for any cases. Here, we will coent on the reason that earson variation is a valid index to evaluate the stable phenoenon of distance functions. In light of Theore 3.3 devised, we have the relationship shown in Figure 3, which leads to the following four conceivable indices to test the stable or unstable) phenoenon: Index 1. Extreal ratio) DMIN Index 2. earson variation) var Index 3. Mean of extreal ratio) E ; d,1,q ) E[d [,1,Q )] DMIN ]; Index 4. Variance of extreal ratio) var ) ; DMIN ). A robust eaningful distance function needs to satisfy li var d,1,q) E[d ) > independently of the distribution of data set. Suppose that,1,q )] we use those indices to evaluate the eaningless behavior, for soe given distance functions, through DMIN all possible data sets. If li = 1 2 d or li var,1,q ) E[d,1,Q )] =, we can conclude that this distance function is eaningless. On the other hand, we can say) that it is stable if li var > d,1,q ) E[d,1,Q )] 2 Note that and DMIN are rando variables. Therefore, this convergence is uch stronger than convergence in probability [4][7][14][15]. 17

7 DMAX li = 1 DMIN d li var E d, Q ),1 p [,1, Q ) ] c DMAX 1 DMIN Unstable henoena p = DMAX li E = 1 DMIN DMAX li var = DMIN Figure 3: The convergent relationships of extreal ratio. [ ], li E DMIN 1, or li var DMIN >. If we decide to apply soe distance-based data ining algoriths to explore a given high diensional data set. We first need to select a eaningful distance function depending on our applications. Therefore, we ay copute those indices for each candidate of distance function, using the whole data set or a sapling subset, to evaluate its eaningful behavior. However, both the ean of extreal ratio Index 3) and the variance of extreal ratio Index 4) are invalid to estiate in this case. Also, though one can deduce that the distance is eaningless if its DMIN value is very close to one, it is not decided whether the function is eaningful or not if its value of extreal ratio Index 1) is apart fro one. In addition, the extreal ratio Index 1) is sensitive to outliers. On other hand, if we apply soe resapling techniques, such as bootstrap [1], to test the stable property of distance functions for soe given high diensional data set. The extreal ratio Index 1) could be inconsistent for any sapling subsets. Consequently, in light of the theoretical results derived and also as will be validated, earson variation Index 2) eerges as the index to use to evaluate the eaningfulness of distance function. 4 Shrinkage-Divergence roxiity As deduced before, the unstable phenoenon is rooted in the variation of the distance distribution. In light of Theore 3.3 devised, we will next design a new proxiity function, called SD Shrinkage-Divergence roxiity), based on soe well defined faily of distance functions {d = 1, 2, }. 4.1 Definition of SD Let f be a non-negative real function defined on the set of non-negative real nubers such that if x < a, f a,b x) = x if a x < b, e x otherwise. For any -diensional data points x = x 1, x 2,, x ) and y = y 1, y 2,, y ), we define the SD function as SD G x, y ) = w i f si1,s i2 d 1 x i, y i )). i=1 The general for of SD, denoted by SD G defines a distance function f si1,s i2 between x and y on each individual attribute. araeters w i i = 1, 2,, are deterined by the doain knowledge subject to the iportance of attribute i for application needs. Also, paraeters s i1 and s i2 for attribute i are dependent on the distribution of the data points projected on ith diension. In any situations, we have no prior knowledge on the weights of iportance aong attributes, and do not know the distribution of each attribute either. In such a case, the general SD function, i.e., SD G, will be degenerated to, SD s1,s 2 x, y ) = f s1,s 2 d 1 x i, y i )). i=1 In this paper, we only discuss the properties and applications on this SD. The paraeters s 1 and s 2 are, respectively, called as shrinkage threshold and divergence threshold. For illustrative purposes, we present in Figure 4 soe 2-diensional balls of center, ) with different radius for SD, fractional function, L 1 etric, and etric. 4.2 roperties of SD We next discuss the properties of SD for siilarity search and data clustering probles. 18

8 4 4 x x 1 a) SDs 1 =.5, s 2 =.6) x x c) L 1 etric x 2 x x 1 b) fractionall = ) x 1.3 d) etric etric with triangle inequality. It can be verified that SD s1,s 2 x, y ) = does not iply x = y for s 1 > and the triangle inequality does not hold in general. However, the influence of triangle inequality is usually insignificant in any clustering applications [9][12], in particular for high diensional space. Statistical View. Here, we exaine the perspectives of SD for clustering applications statistically. Assue that the data distributions are independent in all diensions of attributes. Given a sall nonnegative nuber ɛ, let s i1 be the axiu value or supreu) such that {d 1 x i, y i ) s i1 } ɛ if x and y belong to distinct clusters. Siilarly, let s i2 be the iniu value or infiu) such that {d 1 x i, y i ) s i2 } ɛ if x and y belong to the sae cluster. Set s1 = in{s i1 i = 1, 2,, } and s 2 = ax{s i2 i = 1, 2,, }. Then, both Figure 4: The 2-diensional balls of center, ) for different radius. roposition of SD If d is the L 1 etric defined on Euclidean space, then SD s1,s 2 is equivalent to L 1 as s 1 and s SD s1,s 2 x, y ) = if and only if d 1 x i, y i ) < s 1 for all i. 3. SD s1,s 2 x, y ) e s2 if d 1 x i, y i ) s 2 for all i. The first property shows that SD is a general for of L 1 etric. The second property eans that the SD is siilar to grid approaches [3] in that all data points within a sall rectangle are ore siilar to each other than to others. Thus, we cannot significantly discern aong the statistically. Therefore, it is reasonable to shrink the proxiity of the to zero. In order to construct a noise insensitive proxiity function, and to avoid over agnifying the distance variation, the SD defines an adaptive proxiity of two data points on individual attributes. For two data points, if values of any attribute are projected into the sae sall interval, we will shrink the proxiity of this attribute to zero. On the other hand, if all projected attributes of two data points are apart fro one to another for a long original distance, then they are dissiilar to each other. As such, we are able to spread the out to increase discriination. The SD can reedy the edge effects proble of grid approach [3] caused by two adjacent grids which ay contain data points very close to one another. It is worth entioning that sae as the fractional function [2] and IDist function of the IGrid-index [3], the SD function is in essence not a and {SD s1,s 2 x, y ) = x and y belong to distinct clusters} ɛ) {SD s1,s 2 x, y ) e s2 x and y belong to the sae cluster} ɛ) will approach as is large. Furtherore, SD is insensitive to noise in high diensional space, because SD disposes individual attribute separately. The noise effects of soe attributes will be itigated by other attributes. Also, the SD is able to avoid spreading data points too sparsely for discriination. Overall, SD has better discriination power than the original distance function, and is hence ore proper for distance-based clustering algoriths. 5 Experiental Evaluation To assess the stableness and the perforance of SD function, we have conducted a series of experients. We copare in our study the stable behaviors and the perforances for distance-based clustering of SD with several well-known distance functions, including L 1 etric, etric, and fractional distance function dis l. Meaningful Behaviors. First we copared the stable behavior of SD, which is based on L etric, with several widely used of distance functions. We use the earson variation Index 2): var d,1,q ) E[d,1,Q )] ), the ean [ of the extreal ratio Index 3): E DMIN ], and the variance of the extreal ratio Index 4): var DMIN 19

9 to evaluate the stability of those distance functions. Recall that coents on these indices and their use are given in Section 3.4. earson Variation earson Variation Diensions a) SD SD Diensions b) L 1,, and fractional function L 1 fractional Figure 5: The average value of earson variation. The synthetic -diensional saple data sets and query points of our experients were generated as follows. Each data set includes 1 thousand independent data points. For each data set, the jth entry x ij of ith data point x i = x i1, x i2,, x i, ) was randoly sapled fro unifor distribution Ua, b + a), or exponential distribution Expλ), or noral distribution Nµ, σ). In each generation for x ij, the paraeters a, b, λ, µ, and σ are randoly sapled fro unifor distributions with range, 1),, 1),.1, 2),, 1), and.1,1), respectively. Forally, for each data set, for any i = 1, 2,, 1, for any j = 1, 2,,, Ua, a + b) with probability1/3, x ij Expλ) with probability1/3, Nµ, σ) with probability1/3, where a, b, µ U, 1), λ U.1, 2), and σ U.1, 1). The query points were also generated by the sae anner. We repeated such process to generate 1 data sets for each diensionality. Diensionality varied fro one to 1. The shrinkage threshold and divergence threshold of SD are s 1 =.5 and s 2 =.6, respectively. The paraeter l for the fractional distance function dis l was set as. The estiations DMIN [ of E DMIN ], var and the average value d of var,1,q ) E[d,1,Q )] were coputed in natural logarith scale to easure the eaningful behavior. lne[ /DMIN ]) lne[ /DMIN ]) Diensions a) SD SD Diensions b) L 1,, and fractional function L 1 fractional Figure 6: The estiations of the ean of the extreal ratio in logarithic scale). Those results are shown fro Figure 5 to Figure 7. In order to perceive the differences aong these perforances easily, we only present the outputs of the first 4 or 2) diensions.) As shown in these figures, the SD is an effective diensionality resistant 2

10 distance function. It is noted that L 1 and etric becoe unstable with as few as 2 diensions. The fractional distance function is ore effective at preserving eaningfulness of proxiity than L 1 and, but starts to suffer fro unstability after the diensionality exceeds 8. In contrast, the SD reains stable even if the diensionality is greater than 1, showing the proinent advantage of using SD. lnvar /DMIN )) lnvar /DMIN )) Diensions a) SD SD Diensions b) L 1,, and fractional function L 1 fractional Figure 7: The estiations of the variance of the extreal ratio in logarithic scale). Clustering Applications. In order to copare the qualitative perforances, we applied the SD function and etric for clustering on ultivariate ixture noral odels. We synthesized any ixtures of two -variate Gaussian data sets with diagonal covariance atrix. All diensions of Cluster 1 and Cluster 2 are sapled iid fro noral distribution Nµ 1, σ1) 2 and Nµ 2, σ2), 2 respectively. We applied atrix powering Table 3: The atrix powering algorith. Algorith: Matrix owering Algorith Inputs: Apairwise distance atrix), ɛ Output: A partition of data points into clusters 1. Copute A 2 = A A 2. for each pair of yet unclassified points i, j a. If A 2 i A2 j ) A2 i A2 j )T < ɛ, then i and j are in the sae cluster. b. If A 2 i A2 j ) A2 i A2 j )T ɛ, then i and j are in different clusters. algorith [16] on those generated data sets to copare the perforances of SD with etric. The outline of atrix powering algorith is given in Table 3 [16]. For any atrix M, we use M j, M ij, and M T to denote, respectively, the j-th row of M, the ij-th entry of M and the transpose of M. Let the precision ratio of the algorith be the percentage of the N 2 pairwise relationship classified as sae or different cluster) that it partitions correctly. Suppose that the expectation of A i,j is p respectively, q) for data points i and j in the sae cluster respectively, different clusters). Also, q > p. Then, the optial threshold given in [16] is ɛ = q p) 2 N 3/2 / 2. However, the knowledge of q p is usually unavailable. In addition, the scales of SD and etric varied. We then odify the threshold as ɛk) = kn DMIN )/2 for variant k, and search the r-feasible range which is defined as the axial interval of k such that the precision ratio is at least r with threshold ɛk). We epirically investigated the behaviors of and SD by using the above atrix powering algorith. The shrinkage and divergence thresholds of SD were set to.5 and 6σ 1 + σ 2 ), respectively. First, we used 1-diensional synthetic data sets drawn fro the ixture of two noral distributions in variant clusters for ean difference µ 1 µ 2. Each data set has 2 data points, and each cluster contains 1 data points. The results are shown in Figure 8a. We also considered the 1-diensional ultivariate ixture odels with several variance ratios σ 1 /σ 2, and showed the results in Figure 8b. Finally, we tested both SD and etric for searching feasible ranges with increasing diensionality. Figure 9. The epirical results are shown in Note that for using atrix powering algorith to solve our data clustering probles, we first need to choose an optial threshold. A wider feasible range offers ore adequate solution space to this proble. As shown in these figures, the feasible ranges of 21

11 98%-feasible range 98%-feasible range SD µ 1 -µ 2 a) σ 1 = 1, µ 2 = 3, σ 2 = 4 SD that SD significantly outperfors the priorly used etric. 98%-feasible range 98%-feasible range SD Diensions SD σ 1 /σ 2 b) µ 1 = µ 2 = 3, σ 2 = Diensions Figure 8: a) The feasible ranges for SD and with varied ean difference of two clusters, b) the feasible ranges for SD and with varied variance ratios of two clusters. etric are uch narrower and the boundary points of feasible ranges are very close to. On the other hand, the feasible ranges of SD are uch wider than even in high diensional space. Further, the SD obtains appropriate response to varying characters of clusters. A larger ean difference or variance) ratio of two clusters iplies a better discriination between the. As shown in Figure 8, the feasible range of SD becoes wider with agnifying the ean difference or the variance ratios of two clusters. Also, the width of feasible ranges for SD increase with increasing diensionality as in Figure 9. On the other hand, due to the unstable phenoenon, the width of feasible ranges for rapidly degrades to with increasing diensionality. Fro Figure 8 and Figure 9, it is shown Figure 9: The feasible range for SD and with varied diensionality. µ 1 = 8, σ 1 = 1) v.s. µ 1 = 3, σ 1 = 4)) 6 Conclusions In this paper, we derived the necessary and the sufficient conditions for the stability of a distance function in high diensional space. Explicitly, we proved that the rapidly degraded earson variation of distance distribution with increasing diensionality is equivalent to i.e., being necessary and sufficient conditions of) unstable phenoenon. This theoretical result on the sufficient condition of a eaningful distance function design derived in this paper leads a powerful eans to test the stability of a distance function in high diensional data space. Explicitly, in light of our results, we have designed a eaningful distance function SD based on a certain given distance function. It was epirically shown that the SD significantly outperfors prior easures for its being stable in high diensional 22

12 data space and also robust to noise, and is thus deeed ore suitable for distance-based clustering applications than the priorly used L p etric. References [1] C. C. Aggarwal. Re-designing distance functions and distance-based applications for high diensional data. ACM SIGMOD Record, Vol. 3, pp.13-18, 21. [2] C. C. Aggarwal, A. Hinneburg, and D. Kei. On the Surprising Behavior of Distance Metrics in High Diensional Space. ICDT Conference, 21. [3] C. C. Aggarwal,and. S. Yu. The IGrid Index: Reversing the Diensionality Curse for Siilarity Indexing in High Diensional Space. ACM SIGKDD Conference, 2. [4]. J. Bickel, and K. A. Doksu. Matheatical Statistics: Basic Ideas and Selected Topics. Vol. 1. 2nd edition. rentice Hall, 21. [5] K.. Bennett, U,. Fayyad, and D. Geiger. Density- Based Indexing for Approxiate Nearest Neighbor Queries. ACM SIGKDD Conference, [6] Beyer K., Goldstein J., Raakrishnan R, and Shaft U. When is Nearest Neighbors Meaningful? ICDT Conference roceedings, [7] K. L. Chung. A Course in robability Theory. 3rd edition. Acadeic ress, 21. [8] H. A. David, and H. N. Nagaraja. Order Statistics. 3rd edition. Wiley & Sons, 23. [9] L. Ertöz, M. Steinbach, and V. Kuar. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Diensional Data. SDM Conference, 23. [1] B. Efron, and R. Tibshirani. An Introduction to the Bootstrap. Chapan and Hall, [11] A. Hinneburg, C. C. Aggarwal,and D. Kei. What is the nearest Neighbor in High Diensional Spaces? VLDB Conference, 2. [12] A. K. Jain, M. N. Murty and. J. Flynn. Data Clustering: A Review. ACM Coputing Surveys, [13] N. Katayaa, and S. Satoh. Distinctiveness-Sensitive Nearest-Neighbor Search for Efficient Siilarity Retrieval of Multiedia Inforation. ICDE Conference, 21. [14] V. V. etrov. Liit Theores of robability Theory. Oxford University ress, 1995 [15] V. K. Rohatgi, and A. K. Md. Ehsanes Saleh. An Introduction to robability and Statistics. 2nd edition. Wiley, 21. [16] H. Zhou, and D. Woodruff. Clustering via Matrix owering. ODS Conference, 24. Appendix A: Related Theores on robability In order to prove our ain theore, we present soe iportant theores fro the probability theory [15] [7] [8]. The proofs are oitted for interest of space. Theore A. 1. If X X, Y Y and g is a continuous function defined on real nubers, then we have the following properties: 1. X X. 2. gx ) gx). 3. ax ± Y ax ± Y for any constant a. 4. X Y XY. 5. X /Y X/a, provided Y = a ) Slutsky s theore). Theore A. 2. If X 1 then X 1 1. Theore A. 3. If E[X 2 ] < then varx) = vare[x Y ]) + E[varX Y )]. Theore A. 4. If X j j = 1, 2..., are independent, then {axx 1,..., X ) ɛ} = {X 1 ɛ, X 2 ɛ,..., X ɛ} = j=1 {X j ɛ}. In order to prove the necessary condition, we also need to derive the following lea. Lea A. 1. For every ɛ >, if li { 1 + ɛ)dmin } = 1 then we have the following properties: DMIN li E[ DMAX DMIN ] = 1 and DMAX li var DMIN =. 3. For any i, j, li {d,i, Q ) 1 + ɛ)d,j, Q )} = For any i, j, li E[ d,i,q) d) ] = 1 and,j,q ) li var =. d,i,q ) d,j,q ) roof. 1. The first proposition follows fro Theore A Fro the probability theory [15],the following properties are all equivalent: a. li { 1 + ɛ)dmin } = 1, b. DMIN 1, DMAX c. DMIN 1 converges in distribution to the degenerate distribution Dx), where Dx) = 1 if x > and Dx) = if x. Hence, we have li E[ DMAX DMIN ] = 1 and li var DMAX DMIN =. 3. Since DMIN d,i,q ) d,j,q ) DMIN for any i, j, hence we have d,i,q) d,j,q ) 1 4.The third proposition also leads to the fourth one. 23

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Statistics and Probability Letters

Statistics and Probability Letters Statistics and Probability Letters 79 2009 223 233 Contents lists available at ScienceDirect Statistics and Probability Letters journal hoepage: www.elsevier.co/locate/stapro A CLT for a one-diensional

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint. 59 6. ABSTRACT MEASURE THEORY Having developed the Lebesgue integral with respect to the general easures, we now have a general concept with few specific exaples to actually test it on. Indeed, so far

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax: A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

FAST DYNAMO ON THE REAL LINE

FAST DYNAMO ON THE REAL LINE FAST DYAMO O THE REAL LIE O. KOZLOVSKI & P. VYTOVA Abstract. In this paper we show that a piecewise expanding ap on the interval, extended to the real line by a non-expanding ap satisfying soe ild hypthesis

More information

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies OPERATIONS RESEARCH Vol. 52, No. 5, Septeber October 2004, pp. 795 803 issn 0030-364X eissn 1526-5463 04 5205 0795 infors doi 10.1287/opre.1040.0130 2004 INFORMS TECHNICAL NOTE Lost-Sales Probles with

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Cosine similarity and the Borda rule

Cosine similarity and the Borda rule Cosine siilarity and the Borda rule Yoko Kawada Abstract Cosine siilarity is a coonly used siilarity easure in coputer science. We propose a voting rule based on cosine siilarity, naely, the cosine siilarity

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Prediction by random-walk perturbation

Prediction by random-walk perturbation Prediction by rando-walk perturbation Luc Devroye School of Coputer Science McGill University Gábor Lugosi ICREA and Departent of Econoics Universitat Popeu Fabra lucdevroye@gail.co gabor.lugosi@gail.co

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

Prerequisites. We recall: Theorem 2 A subset of a countably innite set is countable.

Prerequisites. We recall: Theorem 2 A subset of a countably innite set is countable. Prerequisites 1 Set Theory We recall the basic facts about countable and uncountable sets, union and intersection of sets and iages and preiages of functions. 1.1 Countable and uncountable sets We can

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Optimal Jamming Over Additive Noise: Vector Source-Channel Case Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper

More information

Spine Fin Efficiency A Three Sided Pyramidal Fin of Equilateral Triangular Cross-Sectional Area

Spine Fin Efficiency A Three Sided Pyramidal Fin of Equilateral Triangular Cross-Sectional Area Proceedings of the 006 WSEAS/IASME International Conference on Heat and Mass Transfer, Miai, Florida, USA, January 18-0, 006 (pp13-18) Spine Fin Efficiency A Three Sided Pyraidal Fin of Equilateral Triangular

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Necessity of low effective dimension

Necessity of low effective dimension Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that

More information

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5 Course STA0: Statistics and Probability Lecture No to 5 Multiple Choice Questions:. Statistics deals with: a) Observations b) Aggregates of facts*** c) Individuals d) Isolated ites. A nuber of students

More information