On the Surprising Behavior of Distance Metrics in High Dimensional Space

Size: px
Start display at page:

Download "On the Surprising Behavior of Distance Metrics in High Dimensional Space"

Transcription

1 On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com 2 Institute of Computer Science, University of Halle Kurt-Mothes-Str., 0620 Halle (Saale), Germany { hinneburg, eim }@informati.uni-halle.e Abstract. In recent years, the effect of the curse of high imensionality has been stuie in great etail on several problems such as clustering, nearest neighbor search, an inexing. In high imensional space the ata becomes sparse, an traitional inexing an algorithmic techniques fail from a efficiency an/or effectiveness perspective. Recent research results show that in high imensional space, the concept of proximity, istance or nearest neighbor may not even be qualitatively meaningful. In this paper, we view the imensionality curse from the point of view of the istance metrics which are use to measure the similarity between objects. We specifically examine the behavior of the commonly use L norm an show that the problem of meaningfulness in high imensionality is sensitive to the value of. For example, this means that the Manhattan istance metric (L norm) is consistently more preferable than the Eucliean istance metric (L 2 norm) for high imensional ata mining applications. Using the intuition erive from our analysis, we introuce an examine a natural extension of the L norm to fractional istance metrics. We show that the fractional istance metric provies more meaningful results both from the theoretical an empirical perspective. The results show that fractional istance metrics can significantly improve the effectiveness of stanar clustering algorithms such as the -means algorithm. Introuction In recent years, high imensional search an retrieval have become very well stuie problems because of the increase importance of ata mining applications [], [2], [3], [4], [5], [8], [0], []. Typically, most real applications which require the use of such techniques comprise very high imensional ata. For such applications, the curse of high imensionality tens to be a major obstacle in the evelopment of ata mining techniques in several ways. For example, the performance of similarity inexing structures in high imensions egraes rapily, so that each query requires the access of almost all the ata []. J. Van en Bussche an V. Vianu (Es.): ICDT 200, LNCS 973, pp , 200. c Springer-Verlag Berlin Heielberg 200

2 On the Surprising Behavior of Distance Metrics 42 It has been argue in [6], that uner certain reasonable assumptions on the ata istribution, the ratio of the istances of the nearest an farthest neighbors to a given target in high imensional space is almost for a wie variety of ata istributions an istance functions. In such a case, the nearest neighbor problem becomes ill efine, since the contrast between the istances to ifferent ata points oes not exist. In such cases, even the concept of proximity may not be meaningful from a qualitative perspective: a problem which is even more funamental than the performance egraation of high imensional algorithms. In most high imensional applications the choice of the istance metric is not obvious; an the notion for the calculation of similarity is very heuristical. Given the non-contrasting nature of the istribution of istances to a given query point, ifferent measures may provie very ifferent orers of proximity of points to a given query point. There is very little literature on proviing guiance for choosing the correct istance measure which results in the most meaningful notion of proximity between two recors. Many high imensional inexing structures an algorithms use the eucliean istance metric as a natural extension of its traitional use in two- or three-imensional spatial applications. In this paper, we iscuss the general behavior of the commonly use L norm (x, y R, Z, L (x, y) = i= ( xi y i ) / ) in high imensional space. The L norm istance function is also susceptible to the imensionality curse for many classes of ata istributions [6]. Our recent results [9] seem to suggest that the L -norm may be more relevant for = or 2 than values of 3. In this paper, we provie some surprising theoretical an experimental results in analyzing the epenency of the L norm on the value of. More specifically, we show that the relative contrasts of the istances to a query point epen heavily on the L metric use. This provies consierable evience that the meaningfulness of the L norm worsens faster with increasing imensionality for higher values of. Thus, for a given problem with a fixe (high) value of the imensionality, it may be preferable to use lower values of. This means that the L istance metric (Manhattan Distance metric) is the most preferable for high imensional applications, followe by the Eucliean Metric (L 2 ), then the L 3 metric, an so on. Encourage by this tren, we examine the behavior of fractional istance metrics, in which is allowe to be a fraction smaller than. We show that this metric is even more effective at preserving the meaningfulness of proximity measures. We bac up our theoretical results with empirical tests on real an synthetic ata showing that the results provie by fractional istance metrics are inee practically useful. Thus, the results of this paper have strong implications for the choice of istance metrics for high imensional ata mining problems. We specifically show the improvements which can be obtaine by applying fractional istance metrics to the stanar -means algorithm. This paper is organize as follows. In the next section, we provie a theoretical analysis of the behavior of the L norm in very high imensionality. In section 3, we iscuss fractional istance metrics an provie a theoretical analysis of their behavior. In section 4, we provie the empirical results, an section 5 provies summary an conclusions.

3 422 C.C. Aggarwal, A. Hinneburg, an D.A. Keim 2 Behavior of the L -Norm in High Dimensionality In orer to present our convergence results, we first establish some notations an efinitions in Table. Table. Notations an Basic Definitions Notation Definition Dimensionality of the ata space N Number of ata points F -imensional ata istribution in (0, ) X Data point from F with each coorinate rawn from F ist (x, y) Distance between (x,...x ) an (y,...y ) using L metric = i= [(xi x i 2) ] / Distance of a vector to the origin (0,...,0) using the function ist (, ) = max{ X } Farthest istance of the N points to the origin using the istance metric L Dmin = min{ X } Nearest istance of the N points to the origin using the istance metric L E[X], var[x] Expecte value an variance of a ranom variable X Y p c A vector sequence Y,...,Y converges in probability to a constant vector c if: ɛ >0 lim P [ist (Y,c) ɛ] = Theorem. Beyer ( et. ) al. (Aapte for L metric) If lim var X E[ X ] =0, then Dmin Dmin p 0. Proof. See [6] for proof of a more general version of this result. The result of the theorem [6] shows that the ifference between the maximum an minimum istances to a given query point oes not increase as fast as the nearest istance to any point in high imensional space. This maes a proximity query meaningless an unstable because there is poor iscrimination between the nearest an furthest neighbor. Henceforth, we will refer to the ratio Dmin Dmin as the relative contrast. The results in [6] use the value of Dmin as an interesting criterion Dmin for meaningfulness. In orer to provie more insight, in the following we analyze the behavior for ifferent istance metrics in high-imensional space. We first assume a uniform istribution of ata points an show our results for N =2 points. Then, we generalize the results to an arbitrary number of points an arbitrary istributions. In this paper, we consistently use the origin as the query point. This choice oes not affect the generality of our results, though it simplifies our algebra consierably.

4 On the Surprising Behavior of Distance Metrics 423 Lemma. Let F be uniform istribution of N =2points. For an L metric, [ ] ( ) ( ) lim E Dmin / /2 (+) / 2 +, where C is some constant. Proof. Let A an B be the two points in a imensional ata istribution such that each coorinate is inepenently rawn from a -imensional ata istribution F with finite mean an stanar eviation. Specifically A = (P...P ) an B = (Q...Q ) with P i an Q i being rawn from F. Let PA = { i= (P i) } / be the istance of A to the origin using the L metric an PB = { i= (Q i) } / the istance of B. The ifference of istances is PA PB = { i= (P i) } / { i= (Q i) } /. It can be shown 2 that the ranom variable (P i ) has mean + an stanar ( ) ( eviation ). This means that (PA ) p (+), (PB ) p (+) an therefore PA / p ( ) /, + PB / p ( ) ( We inten to show that PA PB / /2 p (+) / ( ) / () ). We can express PA PB in the following numerator/enominator form which we will use in orer to examine the convergence behavior of the numerator an enominator iniviually. (PA ) (PB ) PA PB = r=0 (PA (2) ) r (PB ) r Diviing both sies by / /2 an regrouping the right-han-sie we get: PA PB = ((PA ) (PB ) ) / / /2 ( PA ) r ( PB ) r (3) r=0 / / Consequently, using Slutsy s theorem 3 an the results of Equation we obtain ( ) r ( ) r ( ) ( )/ PA PB / / p (4) + r=0 Having characterize the convergence behavior of the enominator of the right han sie of Equation 3, let us now examine the behavior of the numerator: (PA ) (PB ) / = i= ((P i) (Q i ) ) / = i= R i /. Here R i is the new ranom variable efine by ((P i ) (Q i ) ) i {,...}. This ranom variable has zero mean an stanar eviation which is 2 σ where 2 This is because E[P i ]=/( + ) an E[P 2 i ]=/(2 + ). 3 Slutsy s Theorem: Let Y...Y... be a sequence of ranom vectors an h( ) be a continuous function. If Y p c then h(y ) p h(c).

5 424 C.C. Aggarwal, A. Hinneburg, an D.A. Keim σ is the stanar eviation of (P i ). The sum of ifferent values of R i over imensions will converge to a normal istribution with mean 0 an stanar eviation 2 σ because of the central limit theorem. Consequently, the mean average eviation of this istribution will be C σ for some constant C. Therefore, we have: lim E [ (PA ) (PB ) ] Since the enominator of Equation 3 shows probabilistic convergence, we can combine the results of Equations 4 an 5 to obtain [ ] PA PB lim E (6) / /2 ( +) / 2 + We can easily generalize the result for a atabase of N uniformly istribute points. The following Corollary provies the result. Corollary. Let F be the uniform istribution of N = n points. Then, ( ) ( ) [ ] ( ) ( C (+) / 2 + lim E Dmin C (n ) / /2 (+) / 2 + Proof. This is because if L is the expecte ifference between the maximum an minimum of two ranomly rawn points, then the same value for n points rawn from the same istribution must be in the range (L, (n ) L). The results can be moifie for arbitrary istributions of N points in a atabase by introucing the constant factor C. In that case, the general epenency of D max D min on 2 remains unchange. A etaile proof is provie in the Appenix; a short outline of the reasoning behin the result is available in [9]. Lemma 2. [ [9] Let F be ] an arbitrary istribution of N =2points. Then, lim E Dmin = C / /2, where C is some constant epenent on. Corollary 2. Let F be the arbitrary istribution of N = n points. Then, [ C lim E Dmin ] (n ) C / /2. ). (5) Thus, this result shows that in high imensional space Dmin increases at the rate of / /2, inepenent of the ata istribution. This means that for the manhattan istance metric, the value of this expression iverges to ; for the Eucliean istance metric, the expression is boune by constants whereas for all other istance metrics, it converges to 0 (see Figure ). Furthermore, the convergence is faster when the value of of the L metric increases. This provies the insight that higher norm parameters provie poorer contrast between the furthest an nearest neighbor. Even more insight may be obtaine by examining the exact behavior of the relative contrast as oppose to the absolute istance between the furthest an nearest point.

6 On the Surprising Behavior of Distance Metrics p=2 p=2 p= (a) = 3 (b) = 2 (c) =.6e+07 p=2/3 p=2/5.4e+07.2e+07 e+07 8e+06 6e+06 4e+06 2e () =2/3 (e) =2/5 Fig.. Dmin epening on for ifferent metrics (uniform ata) Table 2. Effect of imensionality on relative (L an L 2) behavior of relative contrast Dimensionality P [U <T ] Dimensionality P [U <T ] Both metrics are the same % % 5 96.% % % 4 9.3% % Theorem[( 2. Let F be the) uniform istribution of N =2points. Then, lim E ] Dmin Dmin 2 +. Proof. Let A, B, P...P, Q...Q, PA, PB be efine as in the proof of Lemma. We have shown in the proof of the previous result that PA ( /. / +) Using Slutsy s theorem we can erive that: min{ PA, PB / / } ( ) / (7) + We have also shown in the previous result that: [ ] PA PB lim E / /2 ( ( +) / ) ( ) 2 + (8) We can combine the results in Equation 7 an 8 to obtain: [ ] PA PB lim E /(2 + ) (9) min{pa,pb } Note that the above results confirm of the results in [6] because it shows that the relative contrast egraes as / for the ifferent istance norms. Note

7 426 C.C. Aggarwal, A. Hinneburg, an D.A. Keim 4.5 RELATIVE CONTRAST FOR UNIFORM DISTRIBUTION RELATIVE CONTRAST N=0,000 N=,000 N= PARAMETER OF DISTANCE NORM Fig. 2. Relative contrast variation with norm parameter for the uniform istribution f= f=0.75 f=0.5 f= Fig. 3. Unit spheres for ifferent fractional metrics (2D) that for values of in the reasonable range of ata mining applications, the norm epenent factor of /(2 + ) may play a valuable role in affecting the relative contrast. For such cases, even the relative rate of egraation of the ifferent istance metrics for a given ata set in the same value of the imensionality may be important. In the Figure 2 we have illustrate the relative contrast create by an artificially generate ata set rawn from a uniform istribution in = 20 imensions. Clearly, the relative contrast ecreases with increasing value of an also follows the same tren as /(2 + ). Another interesting aspect which can be explore to improve nearest neighbor an clustering algorithms in high-imensional space is the effect of on the relative contrast. Even though the expecte relative contrast always ecreases with increasing imensionality, this may not necessarily be true for a given ata set an ifferent. To show this, we performe the following experiment on the ) Manhattan (L ) an Eucliean (L 2 ) istance metric: Let U = ( ( 2 Dmin2 Dmin 2 ). We performe some empirical tests to calculate an T = Dmin Dmin the value of P [U < T ] for the case of the Manhattan (L ) an Eucliean (L 2 ) istance metrics for N = 0 points rawn from a uniform istribution. In each trial, U an T were calculate from the same set of N = 0 points, an P [U <T ] was calculate by fining the fraction of times U was less than T in 000 trials. The results of the experiment are given in Table 2. It is clear that with increasing imensionality, the value of P [U <T ] continues to increase. Thus, for higher imensionality, the relative contrast provie by a norm with smaller parameter is more liely to ominate another with a larger parameter. For imensionalities of 20 or higher it is clear that the manhattan istance metric provies a significantly higher relative contrast than the Eucliean istance metric with very high probability. Thus, among the istance metrics with integral norms, the manhattan istance metric is the metho of choice for proviing the best contrast between the ifferent points. This result of our analysis can be irectly use in a number of ifferent applications.

8 3 Fractional Distance Metrics On the Surprising Behavior of Distance Metrics 427 The result of the previous section that the Manhattan metric ( = ) provies the best iscrimination in high-imensional ata spaces is the motivation for looing into istance metrics with <. We call these metrics fractional istance metrics. A fractional istance metric ist f (L f norm) for f (0, ) is efine as: ist f (x, y) = i= [ (x i y i ) f ] /f. To give a intuition of the behavior of the fractional istance metric we plotte in Figure 3 the unit spheres for ifferent fractional metrics in R 2. We will prove most of our results in this section assuming that f is of the form /l, where l is some integer. The reason that we show the results for this special case is that we are able to use nice algebraic trics for the proofs. The natural conjecture from the smooth continuous variation of ist f with f is that the results are also true for arbitrary values of f. 4. Our results provie consierable insights into the behavior of the fractional istance metric an its relationship with the L -norm for integral values of. Lemma 3. Let F be the uniform istribution of N =2points an f =/l for some integer [ l. Then, ] ( ) ( ) lim E f Dminf /f /2 (f+) /f 2 f+. Proof. Let A, B, P...P, Q...Q, PA, PB be efine using the L f metric as they were efine in Lemma for the L metric. Let further QA =(PA ) f = (PA ) /l = i= (P i) f an QB =(PB ) f =(PB ) /l = i= (Q i) f. Analogous to Lemma, QA p f+, QB p f+. [ ] ( ) ( ) We inten to show that E PA PB l /2 (f+) /f 2 f+. The ifference of istances is PA PB = { i= (P i) f } /f { i= (Q i) f } /f = { i= (P i) f } l { i= (Q i) f } l. Note that the above expression is of the form a l b l = a b ( l r=0 ar b l r ). Therefore, PA PB can be written as { i= (P i) f (Q i ) f } { l r=0 (QA ) r (QB ) l r }. By iviing both sies by /f /2 an regrouping the right han sie we get: PA PB i= /f /2 p { (P i) f (Q i ) f l ( ) r ( ) l r QA QB } { } (0) By using the results in Equation 0, we can erive that: PA PB i= /f /2 p { (P i) f (Q i ) f } {l } () ( + f) l 4 Empirical simulations of the relative contrast show this is inee the case. r=0

9 428 C.C. Aggarwal, A. Hinneburg, an D.A. Keim This ranom variable (P i ) f (Q i ) f has zero mean an stanar eviation which is 2 σ where σ is the stanar eviation of (P i ) f. The sum of ifferent values of (P i ) f (Q i ) f over imensions will converge to normal istribution with mean 0 an stanar eviation 2 σ because of the central limit theorem. Consequently, the expecte mean average eviation of this normal istribution is C σ for some constant C. Therefore, we have: [ (PA ) f (PB ) f ] ( ) ( f lim E σ f + Combining the results of Equations 2 an, we get: 2 f + ). (2) [ ] ( ) ( ) PA PB C lim E = /f /2 (f +) /f 2 f + (3) An irect consequence of the above result is the following generalization to N = n points. Corollary 3. When F is the uniform istribution of N = n points an f =/l for some integer l. Then, for some constant C we have: ( ) ( ) [ ] ( ) ( ) C (f+) /f 2 f+ lim E f Dminf C (n ) /f /2 (f+) /f 2 f+. Proof. Similar to corollary. The above result shows that the absolute ifference between the maximum an minimum for the fractional istance metric increases at the rate of /f /2. Thus, the smaller the fraction, the greater the rate of absolute ivergence between the maximum an minimum value. Now, we will examine the relative contrast of the fractional istance metric. Theorem 3. Let F be the uniform istribution of N =2points an f =/l for some( integer l. Then, ) lim f Dminf Dmin f 2 f+ for some constant C. Proof. Analogous to the proof of Theorem 2. The following is the irect generalization to N = n points. Corollary 4. Let F be the uniform istribution of N = n points, an f =/l for some integer l. Then, for some constant C [ ] C 2 f+ lim E C (n ) 2 f+. f Dminf Dmin f Proof. Analogous to the proof of Corollary.

10 On the Surprising Behavior of Distance Metrics 429 This result is true for the case of arbitrary values f (not just f =/l) an N, but the use of these specific values of f helps consierably in simplification of the proof of the result. The empirical simulation in Figure 2, shows the behavior for arbitrary values of f an N. The curve for each value of N is ifferent but all curves fit the general tren of reuce contrast with increase value of f. Note that the value of the relative contrast for both, the case of integral istance metric L an fractional istance metric L f is the same in the bounary case when f = =. The above results show that fractional istance metrics provie better contrast than integral istance metrics both in terms of the absolute istributions of points to a given query point an relative istances. This is a surprising result in light of the fact that the Eucliean istance metric is traitionally use in a large variety of inexing structures an ata mining applications. The wiesprea use of the Eucliean istance metric stems from the natural extension of applicability to spatial atabase systems (many multiimensional inexing structures were initially propose in the context of spatial systems). However, from the perspective of high imensional ata mining applications, this natural interpretability in 2 or 3-imensional spatial systems is completely irrelevant. Whether the theoretical behavior of the relative contrast also translates into practically useful implications for high imensional ata mining applications is an issue which we will examine in greater etail in the next section. 4 Empirical Results In this section, we show that our surprising finings can be irectly applie to improve existing mining techniques for high-imensional ata. For the experiments, we use synthetic an real ata. The synthetic ata consists of a number of clusters (ata insie the clusters follow a normal istribution an the cluster centers are uniformly istribute). The avantage of the synthetic ata sets is that the clusters are clearly separate an any clustering algorithm shoul be able to ientify them correctly. For our experiments we use one of the most wiely use stanar clustering algorithms - the -means algorithm. The ata set use in the experiments consists of 6 clusters with 0000 ata points each an no noise. The imensionality was chosen to be 20. The results of our experiments show that the fractional istance metrics provies a much higher classification rate which is about 99% for the fractional istance metric with f =0.3 versus 89% for the Eucliean metric (see figure 4). The etaile results incluing the confusion matrices obtaine are provie in the appenix. For the experiments with real ata sets, we use some of the classification problems from the UCI machine learning repository 5. All of these problems are classification problems which have a large number of feature variables, an a special variable which is esignate as the class label. We use the following simple experiment: For each of the cases that we teste on, we strippe off the 5 http : // mlearn

11 430 C.C. Aggarwal, A. Hinneburg, an D.A. Keim Classification Rate Distance Parameter Fig. 4. Effectiveness of -Means class variable from the ata set an consiere the feature variables only. The query points were pice from the original atabase, an the closest l neighbors were foun to each target point using ifferent istance metrics. The technique was teste using the following two measures:. Class Variable Accuracy: This was the primary measure that we use in orer to test the quality of the ifferent istance metrics. Since the class variable is nown to epen in some way on the feature variables, the proximity of objects belonging to the same class in feature space is evience of the meaningfulness of a given istance metric. The specific measure that we use was the total number of the l nearest neighbors that belonge to the same class as the target object over all the ifferent target objects. Neeless to say, we o not inten to propose this ruimentary unsupervise technique as an alternative to classification moels, but use the classification performance only as an evience of the meaningfulness (or lac of meaningfulness) of a given istance metric. The class labels may not necessarily always correspon to locality in feature space; therefore the meaningfulness results presente are eviential in nature. However, a consistent effect on the class variable accuracy with increasing norm parameter oes ten to be a powerful way of emonstrating qualitative trens. 2. Noise Stability: How oes the quality of the istance metric vary with more or less noisy ata? We use noise masing in orer to evaluate this aspect. In noise masing, each entry in the atabase was replace by a ranom entry with masing probability p c. The ranom entry was chosen from a uniform istribution centere at the mean of that attribute. Thus, when p c is, the ata is completely noisy. We stuie how each of the two problems were affecte by noise masing. In Table 3, we have illustrate some examples of the variation in performance for ifferent istance metrics. Except for a few exceptions, the major tren in this table is that the accuracy performance ecreases with increasing value of the norm parameter. We have show the table in the range L 0. to L 0 because it was easiest to calculate the istance values without exceeing the numerical ranges in the computer representation. We have also illustrate the accuracy performance when the L metric is use. One interesting observation is that the accuracy with the L istance metric is often worse than the accuracy value by picing a recor from the atabase at ranom an reporting the corresponing target

12 On the Surprising Behavior of Distance Metrics 43 Table 3. Number of correct class label matches between nearest neighbor an target Data Set L 0. L 0.5 L L 2 L 4 L 0 L Ranom Machine Mus Breast Cancer (wbc) Segmentation Ionosphere L(0.) L() L(0) ACCURACY RATIO TO RANDOM MATCHING ACCURACY OF RANDOM MATCHING ACCURACY RATIO ACCURACY OF RANDOM MATCHING PARAMETER OF DISTANCE NORM USED NOISE MASKING PROBABILITY Fig. 5. Accuracy epening on the norm parameter Fig. 6. Accuracy epening on noise masing value. This tren is observe because of the fact that the L metric only loos at the imension at which the target an neighbor are furthest apart. In high imensional space, this is liely to be a very poor representation of the nearest neighbor. A similar argument is true for L istance metrics (for high values of ) which provie unue importance to the istant (sparse/noisy) imensions. It is precisely this aspect which is reflecte in our theoretical analysis of the relative contrast, which results in istance metrics with high norm parameters to be poorly iscriminating between the furthest an nearest neighbor. In Figure 5, we have shown the variation in the accuracy of the class variable matching with, when the L norm is use. The accuracy on the Y -axis is reporte as the ratio of the accuracy to that of a completely ranom matching scheme. The graph is average over all the ata sets of Table 3. It is easy to see that there is a clear tren of the accuracy worsening with increasing values of the parameter. We also stuie the robustness of the scheme to the use of noise masing. For this purpose, we have illustrate the performance of three istance metrics in Figure 6: L 0., L, an L 0 for various values of the masing probability on the machine ata set. On the X-axis, we have enote the value of the masing probability, whereas on the Y -axis we have the accuracy ratio to that of a completely ranom matching scheme. Note that when the masing probability is, then any scheme woul egrae to a ranom metho. However, it is interesting to see from Figure 6 that the L 0 istance metric egraes much faster to the

13 432 C.C. Aggarwal, A. Hinneburg, an D.A. Keim ranom performance (at a masing probability of 0.4), whereas the L egraes to ranom at 0.6. The L 0. istance metric is most robust to the presence of noise in the ata set an egraes to ranom performance at the slowest rate. These results are closely connecte to our theoretical analysis which shows the rapi lac of iscrimination between the nearest an furthest istances for high values of the norm-parameter because of unue weighting being given to the noisy imensions which contribute the most to the istance. 5 Conclusions an Summary In this paper, we showe some surprising results of the qualitative behavior of the ifferent istance metrics for measuring proximity in high imensionality. We emonstrate our results in both a theoretical an empirical setting. In the past, not much attention has been pai to the choice of istance metrics use in high imensional applications. The results of this paper are liely to have a powerful impact on the particular choice of istance metric which is use from problems such as clustering, categorization, an similarity search; all of which epen upon some notion of proximity. References. Weber R., Sche H.-J., Blott S.: A Quantitative Analysis an Performance Stuy for Similarity-Search Methos in High-Dimensional Spaces. VLDB Conference Proceeings, Bennett K. P., Fayya U., Geiger D.: Density-Base Inexing for Approximate Nearest Neighbor Queries. ACM SIGKDD Conference Proceeings, Berchtol S., Böhm C., Kriegel H.-P.: The Pyrami Technique: Towars Breaing the Curse of Dimensionality. ACM SIGMOD Conference Proceeings, June Berchtol S., Böhm C., Keim D., Kriegel H.-P.: A Cost Moel for Nearest Neighbor Search in High Dimensional Space. ACM PODS Conference Proceeings, Berchtol S., Ertl B., Keim D., Kriegel H.-P. Seil T.: Fast Nearest Neighbor Search in High Dimensional Spaces. ICDE Conference Proceeings, Beyer K., Golstein J., Ramarishnan R., Shaft U.: When is Nearest Neighbors Meaningful? ICDT Conference Proceeings, Shaft U., Golstein J., Beyer K.: Nearest Neighbor Query Performance for Unstable Distributions. Technical Report TR 388, Department of Computer Science, University of Wisconsin at Maison. 8. Guttman, A.: R-Trees: A Dynamic Inex Structure for Spatial Searching. ACM SIGMOD Conference Proceeings, Hinneburg A., Aggarwal C., Keim D.: What is the nearest neighbor in high imensional spaces? VLDB Conference Proceeings, Katayama N., Satoh S.: The SR-Tree: An Inex Structure for High Dimensional Nearest Neighbor Queries. ACM SIGMOD Conference Proceeings, Lin K.-I., Jagaish H. V., Faloutsos C.: The TV-tree: An Inex Structure for High Dimensional Data. VLDB Journal, Volume 3, Number 4, pages , 992.

14 On the Surprising Behavior of Distance Metrics 433 Appenix Here we provie a etaile proof of Lemma 2, which proves our moifie convergence results for arbitrary istributions of points. This Lemma shows that the asymptotical rate of convergence of the absolute ifference of istances between the nearest an furthest points is epenent on the istance norm use. To recap, we restate Lemma 2. Lemma 2: [ Let F be an arbitrary ] istribution of N =2points. Then, lim E Dmin = C / /2, where C is some constant epenent on. Proof. Let A an B be the two points in a imensional ata istribution such that each coorinate is inepenently rawn from the ata istribution F. Specifically A =(P...P ) an B =(Q...Q ) with P i an Q i being rawn from F. Let PA = { i= (P i) } / be the istance of A to the origin using the L metric an PB = { i= (Q i) } / the istance of B. We assume that the th power of a ranom variable rawn from the istribution F has mean µ F, an stanar eviation σ F,. This means that: PA p µ F,, PB p µ F, an therefore: PA / / p (µ F, ) /, PB / / p (µ F, ) /. (4) We inten to show that PA PB / /2 p C for some constant C epening on. We express PA PB in the following numerator/enominator form which we will use in orer to examine the convergence behavior of the numerator an enominator iniviually. PA PB = (PA ) (PB ) r=0 (PA ) r (PB ) r (5) Diviing both sies by / /2 an regrouping on right-han-sie we get PA PB = (PA ) (PB ) / / /2 ( PA ) r ( PB ) r (6) r=0 / / Consequently, using Slutsy s theorem an the results of Equation 4 we have: ( PA / /) r ( PB / /) r p (µ F, ) ( )/ (7) r=0 Having characterize the convergence behavior of the enominator of the righthan-sie of Equation 6, let us now examine the behavior of the numerator: (PA ) (PB ) / = i= ((P i) (Q i ) ) / = i= R i /. Here R i is the new ranom variable efine by ((P i ) (Q i ) ) i {,...}. This ranom variable has zero mean an stanar eviation which is 2 σ F, where σ F, is the stanar eviation of (P i ). Then, the sum of ifferent values

15 434 C.C. Aggarwal, A. Hinneburg, an D.A. Keim of R i over imensions will converge to a normal istribution with mean 0 an stanar eviation 2 σ F, because of the central limit theorem. Consequently, the mean average eviation of this istribution will be C σ F, for some constant C. Therefore, we have: [ (PA ) (PB ) ] lim E σ F, (8) Since the enominator of Equation 6 shows probabilistic convergence, we can combine the results of Equations 7 an 8 to obtain: [ ] PA PB σ F, lim E (9) / /2 µ ( )/ F, The result follows. Confusion Matrices. We have illustrate the confusion matrices for two ifferent values of p below. As illustrate, the confusion matrix for using the value p =0.3 is significantly better than the one obtaine using p =2. Table 4. Confusion Matrix- p=2, (rows for prototype, colums for cluster) Table 5. Confusion Matrix- p=0.3, (rows for prototype, colums for cluster)

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

High-Dimensional p-norms

High-Dimensional p-norms High-Dimensional p-norms Gérar Biau an Davi M. Mason Abstract Let X = X 1,...,X be a R -value ranom vector with i.i.. components, an let X p = j=1 X j p 1/p be its p-norm, for p > 0. The impact of letting

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations Diophantine Approximations: Examining the Farey Process an its Metho on Proucing Best Approximations Kelly Bowen Introuction When a person hears the phrase irrational number, one oes not think of anything

More information

A Lower Bound On Proximity Preservation by Space Filling Curves

A Lower Bound On Proximity Preservation by Space Filling Curves A Lower Boun On Proximity Preservation by Space Filling Curves Pan Xu Inustrial an Manufacturing Systems Engg. Iowa State University Ames, IA, USA Email: panxu@iastate.eu Srikanta Tirthapura Electrical

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.

More information

On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness

On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness Journal of Machine Learning Research 18 2018 1-60 Submitte 3/17; Revise 2/18; Publishe 4/18 On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct an Reverse Nearest Neighbors, an

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Quantum mechanical approaches to the virial

Quantum mechanical approaches to the virial Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Bohr Model of the Hydrogen Atom

Bohr Model of the Hydrogen Atom Class 2 page 1 Bohr Moel of the Hyrogen Atom The Bohr Moel of the hyrogen atom assumes that the atom consists of one electron orbiting a positively charge nucleus. Although it oes NOT o a goo job of escribing

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

Chromatic number for a generalization of Cartesian product graphs

Chromatic number for a generalization of Cartesian product graphs Chromatic number for a generalization of Cartesian prouct graphs Daniel Král Douglas B. West Abstract Let G be a class of graphs. The -fol gri over G, enote G, is the family of graphs obtaine from -imensional

More information

Lecture 6 : Dimensionality Reduction

Lecture 6 : Dimensionality Reduction CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Similarity Measures for Categorical Data A Comparative Study. Technical Report

Similarity Measures for Categorical Data A Comparative Study. Technical Report Similarity Measures for Categorical Data A Comparative Stuy Technical Report Department of Computer Science an Engineering University of Minnesota 4-92 EECS Builing 200 Union Street SE Minneapolis, MN

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

CS9840 Learning and Computer Vision Prof. Olga Veksler. Lecture 2. Some Concepts from Computer Vision Curse of Dimensionality PCA

CS9840 Learning and Computer Vision Prof. Olga Veksler. Lecture 2. Some Concepts from Computer Vision Curse of Dimensionality PCA CS9840 Learning an Computer Vision Prof. Olga Veksler Lecture Some Concepts from Computer Vision Curse of Dimensionality PCA Some Slies are from Cornelia, Fermüller, Mubarak Shah, Gary Braski, Sebastian

More information

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

Sparse Reconstruction of Systems of Ordinary Differential Equations

Sparse Reconstruction of Systems of Ordinary Differential Equations Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA

More information

CONTROL CHARTS FOR VARIABLES

CONTROL CHARTS FOR VARIABLES UNIT CONTOL CHATS FO VAIABLES Structure.1 Introuction Objectives. Control Chart Technique.3 Control Charts for Variables.4 Control Chart for Mean(-Chart).5 ange Chart (-Chart).6 Stanar Deviation Chart

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

Quantum Search on the Spatial Grid

Quantum Search on the Spatial Grid Quantum Search on the Spatial Gri Matthew D. Falk MIT 2012, 550 Memorial Drive, Cambrige, MA 02139 (Date: December 11, 2012) This paper explores Quantum Search on the two imensional spatial gri. Recent

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency Transmission Line Matrix (TLM network analogues of reversible trapping processes Part B: scaling an consistency Donar e Cogan * ANC Eucation, 308-310.A. De Mel Mawatha, Colombo 3, Sri Lanka * onarecogan@gmail.com

More information

Maximal Causes for Non-linear Component Extraction

Maximal Causes for Non-linear Component Extraction Journal of Machine Learning Research 9 (2008) 1227-1267 Submitte 5/07; Revise 11/07; Publishe 6/08 Maximal Causes for Non-linear Component Extraction Jörg Lücke Maneesh Sahani Gatsby Computational Neuroscience

More information

A LIMIT THEOREM FOR RANDOM FIELDS WITH A SINGULARITY IN THE SPECTRUM

A LIMIT THEOREM FOR RANDOM FIELDS WITH A SINGULARITY IN THE SPECTRUM Teor Imov r. ta Matem. Statist. Theor. Probability an Math. Statist. Vip. 81, 1 No. 81, 1, Pages 147 158 S 94-911)816- Article electronically publishe on January, 11 UDC 519.1 A LIMIT THEOREM FOR RANDOM

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

arxiv: v1 [math.co] 15 Sep 2015

arxiv: v1 [math.co] 15 Sep 2015 Circular coloring of signe graphs Yingli Kang, Eckhar Steffen arxiv:1509.04488v1 [math.co] 15 Sep 015 Abstract Let k, ( k) be two positive integers. We generalize the well stuie notions of (k, )-colorings

More information

WESD - Weighted Spectral Distance for Measuring Shape Dissimilarity

WESD - Weighted Spectral Distance for Measuring Shape Dissimilarity 1 WESD - Weighte Spectral Distance for Measuring Shape Dissimilarity Ener Konukoglu, Ben Glocker, Antonio Criminisi an Kilian M. Pohl Abstract This article presents a new istance for measuring shape issimilarity

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

Database-friendly Random Projections

Database-friendly Random Projections Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Algorithms and matching lower bounds for approximately-convex optimization

Algorithms and matching lower bounds for approximately-convex optimization Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department

More information

Math 1271 Solutions for Fall 2005 Final Exam

Math 1271 Solutions for Fall 2005 Final Exam Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

arxiv: v2 [physics.data-an] 5 Jul 2012

arxiv: v2 [physics.data-an] 5 Jul 2012 Submitte to the Annals of Statistics OPTIMAL TAGET PLAE RECOVERY ROM OISY MAIOLD SAMPLES arxiv:.460v physics.ata-an] 5 Jul 0 By Daniel. Kaslovsky an rançois G. Meyer University of Colorao, Bouler Constructing

More information

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

arxiv: v2 [cs.ds] 11 May 2016

arxiv: v2 [cs.ds] 11 May 2016 Optimizing Star-Convex Functions Jasper C.H. Lee Paul Valiant arxiv:5.04466v2 [cs.ds] May 206 Department of Computer Science Brown University {jasperchlee,paul_valiant}@brown.eu May 3, 206 Abstract We

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization JMLR: Workshop an Conference Proceeings vol 30 013) 1 On the Complexity of Banit an Derivative-Free Stochastic Convex Optimization Oha Shamir Microsoft Research an the Weizmann Institute of Science oha.shamir@weizmann.ac.il

More information

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation Error Floors in LDPC Coes: Fast Simulation, Bouns an Harware Emulation Pamela Lee, Lara Dolecek, Zhengya Zhang, Venkat Anantharam, Borivoje Nikolic, an Martin J. Wainwright EECS Department University of

More information

Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis

Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis Chuang Wang, Yonina C. Elar, Fellow, IEEE an Yue M. Lu, Senior Member, IEEE Abstract We present a high-imensional analysis

More information

A Randomized Approximate Nearest Neighbors Algorithm - a short version

A Randomized Approximate Nearest Neighbors Algorithm - a short version We present a ranomize algorithm for the approximate nearest neighbor problem in - imensional Eucliean space. Given N points {x } in R, the algorithm attempts to fin k nearest neighbors for each of x, where

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Dot trajectories in the superposition of random screens: analysis and synthesis

Dot trajectories in the superposition of random screens: analysis and synthesis 1472 J. Opt. Soc. Am. A/ Vol. 21, No. 8/ August 2004 Isaac Amiror Dot trajectories in the superposition of ranom screens: analysis an synthesis Isaac Amiror Laboratoire e Systèmes Périphériques, Ecole

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

Monotonicity for excited random walk in high dimensions

Monotonicity for excited random walk in high dimensions Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter

More information

Counting Lattice Points in Polytopes: The Ehrhart Theory

Counting Lattice Points in Polytopes: The Ehrhart Theory 3 Counting Lattice Points in Polytopes: The Ehrhart Theory Ubi materia, ibi geometria. Johannes Kepler (1571 1630) Given the profusion of examples that gave rise to the polynomial behavior of the integer-point

More information

Introduction to variational calculus: Lecture notes 1

Introduction to variational calculus: Lecture notes 1 October 10, 2006 Introuction to variational calculus: Lecture notes 1 Ewin Langmann Mathematical Physics, KTH Physics, AlbaNova, SE-106 91 Stockholm, Sween Abstract I give an informal summary of variational

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information