A New Family of Near-metrics for Universal Similarity

Size: px
Start display at page:

Download "A New Family of Near-metrics for Universal Similarity"

Transcription

1 arxiv: v3 [stat.ml] 17 Oct 2017 A New Family of Near-metrics for Universal Similarity Chu Wang Iraj Saniee William S. Kenney Chris A. White October 18, 2017 Abstract We propose a family of near-metrics base on local graph iffusion to capture similarity for a wie class of ata sets. These quasi-metametrics, as their name suggests, ispense with one or two stanar axioms of metric spaces, specifically istinguishability an symmetry, so that similarity between ata points of arbitrary type an form coul be measure broaly an effectively. The propose near-metric family inclues the forwar k-step iffusion an its reverse, typically on the graph consisting of ata objects an their features. By construction, this family of near-metrics is particularly appropriate for categorical ata, continuous ata, an vector representations of images an text extracte via eep learning approaches. We conuct extensive experiments to evaluate the performance of this family of similarity measures an compare an contrast with traitional measures of similarity use for each specific application an with the groun truth when available. We show that for structure ata incluing categorical an continuous ata, the near-metrics corresponing to normalize forwar k-step iffusion (k small) work as one of the best performing similarity measures; for vector representations of text an images incluing those extracte from eep learning, the near-metrics erive from normalize an reverse k-step graph iffusion (k very small) exhibit outstaning ability to istinguish ata points from ifferent classes. Nokia Bell Labs, 600 Mountain Avenue, Murray Hill, NJ (chu.wang, iraj.saniee, william.kenney, chris.white@nokia-bell-labs.com). 1

2 1 Introuction A core requirement an a funamental moule in various machine learning tasks is to measure the similarity between ata points. Clustering, community etection, search an query, an recommenation algorithms, to name only a few, involve similarity erivations that are essentially constructe on top of unerlying istance measures. To this en, numerous similarities have been esigne specifically for particular types of ata or ata sets in orer to optimize the performance [4, 11, 33]. For categorical ata sets, the most straightforwar similarity measure is the overlap measure, which counts the number of matching attributes between two ata points. More elicate measures take into consieration the number of categories, feature frequencies, an other local or global information. Frequently use measures for categorical ata inclue but are not limite to Eskin, Lin, Gooall, an Occurrence Frequency. We refer intereste reaers to [4] for a comprehensive review. For continuous ata sets, completely ifferent similarity measures have been use. Cosine similarity an the inner prouct similarity are textbook approaches [13]. The Minkowski istance is wiely use which inclues both the Manhattan istance (orer 1 version) an Eucliean istance (orer 2 version). If ata points are generate from the same istribution, Mahalanobis istance, or more broaly Bregman ivergence, is frequently aopte [7, 5]. For istance between ata sets or istributions, Bhattacharyya an Hellinger istances an Kullback-Leibler ivergence are often leverage [3, 17, 15]. Hamming istance continues to be use for bit-by-bit string comparison [12]. At the risk of information loss, categorical similarities also apply if the ata is iscretize first. The MDL metho is the most well-known algorithm for such a iscretization task [9]. Besies structure ata types mentione above, unstructure ata, such as text, auio, image, an vieo, are even more commonly encountere [10]. It is not straightforwar to regar unstructure ata as in a simple vector form, because of its sequential or geometrical properties [28, 21, 26, 27, 11]. One nees to carefully transform such unstructure ata into a vector representation before applying similarity measures for reasonable success. For text ata, the traitional tf-if metho captures the frequency information while loses the orer information [30]. More recent eep learning approaches like the wor2vec moel maps any given wor into a ense, low-imensional (usually no more than a few hunre tuple) vector [20]. Deep-learning base 2

3 vector representation is alreay making its way in various applications [16, 6, 14, 34], yet to the best of our knowlege, there is currently no relate work on similarity measures esigne for such vectors. In this paper, we propose a family of near-metrics base on local graph iffusion to quantify a istance or likeness between ata points. Instea of focusing on a specific ata set or ata type, our motivation is to evelop a family of istance-like functions, or near-metrics, that are sufficiently general to capture all the above ata types in a single representation. For n ata points, each of them with m features, we regar the ata set as a bipartite graph consisting of n object noes an m feature noes connecte by mn weighte eges. The ege weights measure the presence, extent or strength, of each feature in each ata object. The similarity of one ata point to another can then be efine by the transition probability of the ranom walk on this bipartite graph from one point to the other for a given number of steps. Such a graph iffusion-base similarity has several natural variants accoring to the locality parameter k which is the number of steps of the ranom walk (its orer), the iffusion irection (the forwar or reverse variant), an the normalization of the feature weights (the normalize variant) where the ege weights are rescale at each incient noe to a to Key Contributions The near-metrics propose are base on the abstract concept of iffusion on the object-feature graph. Therefore, they are not limite to specific ata types or ata sets an we expect them to work in a wie variety of areas. We follow a ual path: first we analyze the graph iffusion similarity itself theoretically in orer to uncover its properties an features; secon, we evaluate the performance of this family of istances on various ata sets an ata types in comparison with frequently use similarities an when possible, to the groun truth. We summarize our key contributions as follows: We propose a family of near-metrics. The intuition behin the localize (i.e., finite-step) graph iffusion similarity is the mass transfer analogy of a ranom walk from one object to another via their common features: the more common the features two objects have the more mass is transferre from an object to the other. The family of near-metrics is analyze theoretically in several respects. We show that, when applie to categorical atasets or istributions, the 3

4 graph iffusion is a metametric. For general ata sets, conitions for the propose near-metrics to be metametrics or quasi-metametrics are thoroughly iscusse. The analysis emonstrates that the propose nearmetrics are able to function as well efine similarities, while giving enough flexibilities to accommoate ifferent types of ata. We evaluate this family of near-metrics on a wie variety of benchmark atasets of all types incluing categorical, continuous, text, images, an their mixtures. For structure ata sets, the propose near-metrics are among the best performing measures of similarity. For compact representations of unstructure ata like texts an images, the propose near-metrics capture the intrinsic relation between ata points an their performance is outstaning. 1.2 Organization of the Paper The rest of the paper is organize as follows. In Section 2, we formally introuce the graph iffusion similarity an its variants, iscuss their basic properties, an provie necessary an sufficient conitions for these to be quasi (loss of symmetry) or metametrics (loss of complete istinguishability). In Section 4, we evaluate the performance of the graph iffusion similarity together with frequently use similarity measures in various bench mark structure ata sets. In Section 5, the evaluation is further conucte for vector representations of unstructure ata incluing texts an images. Section 6 iscusses our results an observations, proposes future research irections, an conclues the paper. 2 Graph Diffusion Similarity Consier n objects, each enowe with a non-negative feature vector of imension m. All the information about objects an their features is capture by a bipartite graph B of n object noes an m feature noes together with the eges between them. The weight of the ege linking object i an feature j is efine as the strength or value of feature j of object i. Continuous ata sets, binary ata sets, an vector representations of unstructure ata can be mappe to this bipartite graph form irectly; for categorical ata, the transformation is also straightforwar: a categorical feature with l ifferent 4

5 categories is replace by a l bits one-hot binary feature vector. Notice that this transform oes not incur any information loss. An example of n = 4 an m = 5 is shown in Figure 1(a), where the blue noes are objects an orange noes are features. To illustrate the iea behin the graph iffusion similarity, we consier the ranom walk on B as follows. We initially place a particle at any object noe the similarity to which we wish to quantify, an let a ranom walk on the graph take place. For example in Figure 1, the particle is originally place at the first noe. Since there are 3 eges pointing away with weights 3, 5, an 2, the probability vector that the particle ens up at feature noes 1 to 5 is (0.3, 0.5, 0, 0.2, 0), respectively, see Figure 1(b). We call this a 1-step iffusion or ranom walk on B starting at noe 1. Now the particle conucts another step of the ranom walk, an the corresponing probability (to arrive at any given object noe) can be compute by aing the probabilities from all feature noe. see Figure 1(c) an 1(). We call these two steps of the ranom walk one roun. The inuce subgraph after one roun (2 steps) gives rise to an object-object graph that we enote by G. In the language of Markov chain theory, the 2m-step ranom walk on B is the first m rouns of iterations in the computation of the stationary istribution (principal eigenvector) of the row-normalize ajacency matrix of G starting at the localization vector u = (0,..., 1,..., 0) where a 1 is place at noe 1 in the above example, see [22] for etails of iterative or power metho. Ranom walk theory is frequently use for the task of community etection [29, 23], yet to our best knowlege, there is no previous work of aopting ranom walk for the construction of similarity measures. Figure 1: Example: the calculation of graph iffusion similarity. Now let the particle start at object i in G, then we efine the orer k iffusion similarity of i to j, enote by g (k) (i, j), as the probability of the particle starting at i ening up at j after k steps in G, or 2k steps in B. Notice that similar objects, that is those with similar features an strengths, have stronger connection in the bipartite graph B, an consequently the 5

6 k-roun transition probability between them in G will be higher. This family of similarity measures may thus be seen as a truncate an localize version of the principal eigenvector computation on G. Unlike this computation, we are intereste in the finite step transition probability between a pair of noes (i, j) on G as a measure of similarity between i an j, which the principal eigenvector oes not provie. It is clear that g (k) (i, j) is not necessarily symmetric, therefore we efine r (k) (i, j) := g (k) (j, i), which we will refer to as the reverse iffusion similarity, an it quantifies the k-step similarity of j to i. To balance the importance of each feature, one can normalize each feature vector s row-sum to 1 an then calculate the graph iffusion similarity. We call the corresponing similarity, enote by n (k) (i, j), the normalize graph iffusion similarity. We will show later that the normalize graph iffusion similarity is symmetric: n (k) (i, j) = n (k) (j, i). All of the above are measures of similarity each with a corresponing measure of istance: g (k) (i, j) := 1 g(k) (i, j), r (k) (i, j) := 1 r(k) (i, j), an (i, j) := 1 n(k) (i, j), which are the graph iffusion istance, reverse n (k) graph iffusion istance, an normalize graph iffusion istance, respectively. For simplifying the analysis, we introuce the matrix form of the above graph iffusion similarity an istance measures. For the n m feature matrix W = (w ij ) where 1 i n an 1 j m, efine iagonal matrices P = (p ij ) an Q = (q ij ) as: p ll = m s=1 w ls, q ll = n s=1 w sl. In other wors, P an Q are the row-sum an column-sum iagonal matrices corresponing to W. We assume that p ll an q ll are non-zero, otherwise we can iscar the null object or remove the absent feature. When its imension is unerstoo, let 1 be the all-one column vector. Define n n matrix S as: S := P 1 W Q 1 W T. (1) Base on the efinition of P an Q, it is clear that S is a row-stochastic matrix since S1 = P 1 W Q 1 W T 1 = P 1 W 1 = 1. Recall that for the ranom walks on B or G, the matrix S is actually the single-step transition matrix on G or the two-step transition matrix on B. Let G (k) = (g (k) (i, j)) be the n n matrix of the pairwise graph iffusion similarity, then it is clear that G (1) = S. The higher orer iffusion similarity is straightforwar to calculate: G (k) = S k = ( P 1 W Q 1 W T ) k. (2) 6

7 In particular, for g (1) (i, j), an explicit formula can be written as: g (1) (i, j) = m s=1 w is w i1 + + w im w js = 1 w 1s + + w ns p ii m s=1 w is w js q ss. (3) A practical issue worth mentioning is the computational cost of the graph iffusion istance. If the goal is to compute a single pair similarity, (5) shows that the cost is O(mn), which is less than ieal. For example, the Eucliean istance or cosine similarity only requires O(m) calculations. However, note that graph iffusion similarity for one pair of objects is not as important as the similarity between a set of objects an a fixe object. To wit, for similarity search an relate tasks relative to an object i, one may nee to calculate g (1) (i, j) for all j followe by ranking. From the matrix form (1), it is clear that the computational cost for this task is still O(mn), which scales linearly in the number of objects an the number of features. Again from (1), the computation of the graph iffusion similarity for all the pairs of objects requires O(mn 2 ) calculations, which is the same as other traitional similarities. The reverse an normalize variants only involve matrix transpose an normalization operations, thus the cost is in the same orer. Since the calculation of the graph iffusion istance can be written in the matrix form (1), parallel computing is also straightforwar to use if an when neee. 3 Metametrics an Quasi-Metametrics In this section, we analyze the conitions uner which the graph iffusion istance or its variants come close to a bona fie metric. We assume that the bipartite graph B is connecte, otherwise the ata set can be partitione into groups of objects with completely ifferent features. As it turns out, we o not actually nee the full strength of a metric but only 1/2 of the key metric properties to juiciously separate points. In fact, some metric properties such as symmetry are even unnatural in our context: among an object A s neighbors B is the most similar to A, but among the many more neighbors of B, A may not be that similar to B. From the four key metric properties, we wish to keep non-negativity an the triangle inequality to preserve a notion of neighborhoo. It turns out the intersection of a quasimetric an a metametric woul be sufficient for our purposes. 7

8 A quasimetric is a metric without the symmetry property while a metametric oes not require that two objects with same features to be ientical. In orer for a function (, ) to be a metametric, has to be non-negative, symmetric, (x, y) = 0 to imply x = y but not necessarily vice versa, an to satisfy the triangle inequality (x, y) + (y, z) (x, z). Notice that in (5), when p ii is the same for all i, g (1) (i, j) becomes symmetric. In fact, the symmetry will be inherite by g (k) (i, j), an we have Theorem 3.1. The normalize graph iffusion istance of orer k, namely n (k) (, ), is a metametric. When applie to istributions or categorical ata, the forwar, reverse, an normalize graph iffusion istances become ientical, an are all metametrics as well. A quasi-metametric is a metametric without symmetry. A quasi-metametric captures the key asymmetric relations in object-feature ata sets an provies a goo neighborhoo structure. Therefore, a quasi-metametric has the only the necessary properties for a similarity to be useful. In the case when p ii are ifferent for ifferent i, symmetry no longer exists, an thus quasi-metametric is our best shot: Theorem 3.2. Let P be the row-sum iagonal matrix for W. If min p ii / max p ii > 2/3, then both the forwar graph iffusion istance g (1) (, ) an te reverse graph iffusion istance r (1) (, ) are quasi-metametrics. Theorem 3.2 shows that, if the row-sums of the features are comparable, then the orer 1 forwar graph iffusion istance an its reverse version are quasi-metametrics. However, since the similarity vector g (k) (i, ) eventually converges to the equilibrium vector of the graph G, the triangle inequality can not hol for large k if the equilibrium vector itself oes not follow the triangle inequality. On the other han, the similarity vector r (k) (i, ) converges to a vector of a constant, thus triangle inequality hols. Detaile proofs for Theorem 3.1 an Theorem 3.2 are left in the supplementary material. 4 Experiments on Structure Data In this section, we evaluate the performance of graph iffusion similarities on structure ata sets. Since similarity is use explicitly or implicitly in a wie spectrum of ata science an machine learning stuies, it is ifficult to analyze 8

9 an evaluate the newly propose near-metrics thoroughly in all kins of tasks an application. Instea, we aim at the first-principle experiments, that we check whether two ata points that are close accoring to a similarity share the same groun truth label. More specifically, let x be any chosen ata point an y the corresponing label. To test the performance of a similarity measure S, we first rank all the ata points with respect to their similarities to x. Then for any 0 < f 1, we calculate the proportion of ata points that hol ifferent labels compare to y in the (nf)-nearest neighbors of x, which yiels the error value e S (x, f) of ata point x at f. The error curve is efine as the average error for all the ata points: E S (f) := 1 e S (x, f). (4) n Notice that E S (1) oes not epen on the similarity measure aopte. It is etermine by the number of ata points in each class. For example, if the ata set contains two classes of equal number of ata points, then E S (1) = 0.5. We efine E S (0) = 0 for convenience. Naturally, the error curve E S (f) is expecte to grow (but not necessarily) when f becomes larger. In the following experiments, we i encounter cases when this curve fluctuates, but most of the time we foun it increases monotonically. In Table 1, we emonstrate the performance of 10 existing similarity measures an the graph iffusion similarity measures of orer 1 to 7, enote by GD1 to GD7. The existing similarity measures inclue overlap, Eskin, IOF, OF, Lin, Gooall3, Gooall4, inner prouct, Eucliean, an cosine. We use reverse graph iffusion uring the experiments. Recall that, when applie to categorical features, all graph iffusion similarities coincie. Among the teste ata sets, the results of 11 are shown in Table 1. In table 1, 9 of the shown ata sets are from the UCI Machine Learning Repository [1]. The LC an PR are loan level ata sets from the two largest P2P sites, Prosper an Lening Club [18, 24]. Because of the page limit, we are not able to emonstrate the error curves irectly in this section; instea, we show the value of E S (0.01), E S (0.02), E S (0.05) in Table 1, which correspon to the average errors at 1%, 2%, an 5% nearest neighbor sets. We make several key observations as follows. It is shown in Table 1 that no single similarity measure ominates all others, which is in accorance with the observation in [4] an the no free lunch theorem in optimization an machine learning [35]. The orer 1 forwar graph iffusion similarity x 9

10 overlap Eskin IOF OF Lin G3 G4 inner l2 cosine GD1 GD2 GD3 GD4 GD5 GD6 GD7 Balance Scale m = n = SPECT Heart m = n = Lening Club m = n = Nursery m = n = Prosper m = n = Tic-Tac-Toe m = n = Car Evaluation m = n = Chess m = n = Hayes-Roth m = n = Connect m = n = Solar Flare m = n = Table 1: Evaluation of similarity measures on ifferent ata sets. For each similarity S, the three rows escribe the value of E S (0.01), E S (0.02), an E S (0.05). For each ata set, its object number n an attribute number m are liste below its name. 10

11 g (1) (, ) is among the best, while IOF, Lin, an Gooall3 also perform well on certain ata sets. In aition, it can be observe that g (1) (, ) usually performs the best compare to its higher orer versions. We will come back to this observation in the next section. 5 Experiments on Unstructure Data In this section, we present experimental evaluations of similarity measures on vector representations of unstructure ata. Unstructure ata like text an images, espite their vast volume an importance, are less unerstoo an more ifficult to analyze. The major reason appears to be that unstructure ata can not be use irectly for the tasks of search, clustering, or recommenation. To this en, various ways of transforming unstructure ata into the vector form have been propose [20, 16]. For example, tf-if converts a sentence or a ocument into a sparse, high-imensional vector [31], an wor2vec moel translates texts into ense vector with semantics in low imensions [20]. Similar approaches for ata sets consisting of or incluing images are still awaiting etaile ocumentation. In this section, we focus on the experimental evaluation of similarity measures base on vector representations erive from tf-if an eep learning. Our evaluation criterion remains the same as in Section 4: to use the error curve E S (f) for comparitson. The experiments are conucte on the IMDB movie review ataset [19], the customer verbatim ataset, an the ImageNet ataset [8]. 5.1 IMDB Movie Review Dataset The IMDB ataset consists of 50,000 movie reviews in the text form [19]. The length of the review varies from very short to more than 2,000 wors. Each movie review is associate with a binary sentiment polarity label. There are 25,000 positive reviews an 25,000 negative reviews. Naturally, a goo similarity measure shoul yiel a higher similarity score for reviews holing the same label. We test the similarity measures on two kins of representations: the tf-if representation an a compact embeing from a convolutional neural network. tf-if representation. A irect way of translating a review into a feature vector is via the tf-if representation [31]. In general, the importance of a wor 11

12 in a review is aresse by its frequency in that review multiplie by its inverse ocument frequency in the entire corpus. Even though this transformation risks over-simplification, it captures some basic frequency-relate information of the texts an is wiely aopte [25, 2]. In the left of Figure 2, we emonstrate the error curves of the forwar graph iffusion similarity, its reverse an normalize variants, an several traitionally use similarity measures. It can be observe that the traitional measures incluing Eucliean, Manhattan, inner prouct, an cosine, are consierably outperforme by the reverse an the normalize graph iffusion similarities. Figure 2: Error curves for the tf-if representation of IMDB movie review ataset. The family of graph iffusion similarities also perform ifferently. We emonstrate in the right of Figure 2 the reverse graph iffusion similarity measures of orer from 1 to 7 as an example. It can be observe that the performances improves an later ecreases as the orer increases, an the orer 4 an orer 5 curves are the best among the 7. Compact embeing via eep learning. In aition to the tf-if transformation, a more recent way of embeing a review into vector space is via eep learning. The eep convolutional neural network we use here is esigne for sentiment analysis by Kim [14]. The input layer converts any incoming paragraph into a vector of unetermine length, then ifferent sizes of convolutional winows further transform the vector into vectors of values. After that, a max pooling layer eliminate the varying length an thus the number of noes is the same as the number of convolution winows. After the training session, the part of the neural network from the input layer to 12

13 the last hien layer itself becomes a function that maps any paragraph of texts into a fixe length of vector. In the following experiments, the vectors are from a CNN with 128 convolution kernels an thus the feature vector consists of 128 imensions. Similar results are observe for ifferent sizes of CNN as long as the number is not too small to lose track of the original information in the sentences. In Figure 3, we show the optimal error curve for reference. The optimal error curve correspons to the optimal similarity measure uner which each review s neighbors are always with the same opinion label an the reviews holing ifferent labels are far away from each other. Therefore the first half of the optimal error curve is 0 an then it graually increases to 0.5. It can be observe at the left of Figure 3 that the reverse an normalize graph iffusion similarity measures outperform others in a clear way. Furthermore, it is shown in the right of Figure 3 that the orer 2 normalize graph iffusion similarity almost coincie with the optimal curve. Figure 3: Error curves for the compact representation of IMDB movie review ataset. An interesting phenomenon regaring graph iffusion similarities of ifferent orers emerges from Figure 2 an Figure 3: when the orer increases, the performance increases at first (first phase) up to a critical orer but then ecreases later to become completely ranom (secon phase). This secon phase is natural because the graph iffusion similarity (k) (i, ) converges to the equilibrium vector of the graph, an the reverse an normalize versions converge to vector of a constant, all of which are oome to be poor. As for the first phase an its critical orer, for the compact representation in Figure 3, the best performance is achieve at orer 2, whereas for the sparse 13

14 representation of tf-if, the optimal orer is at 4 or 5. From the iscussion in Section 2 we recall that the spee of the information propagation in the graph is highly relate to the sparsity of the feature matrix W. When W or the graph is too sparse, the similarity between a pair of objects can harly be aequately quantifie if the initial probability mass is not well iffuse in the entire system. Thus, we nee higher orer to achieve goo performance for sparse ata. 5.2 Customer Verbatim Data set The customer verbatim ata set is a private ata set containing 21,621 paragraphs, each being a review of a company an its competitors proucts an services quantifie by a collection of business customers. Each of the reviews is labele accoring to whether it is in favor of the company s competitors, an whether the customer likes the company s main prouct. Again, we first use a CNN to o an embeing for extraction of compact feature vectors. Eventually, each review is represente by a 128-imensional vector of non-negative values. We calculate the error curves for both the opinion-of-competitor label an the opinion-of-prouct label. The curves are shown in Figure 4. We observe that the graph iffusion similarity family is able to achieve near optimal performance, while traitional similarities only perform in a meiocre way. As before, the performance of the graph iffusion similarity family improves to reach a near optimal state (at orer 2 for opinion-ofcompetitor an orer 2 or 3 for opinion-of-prouct) an then eteriorates. This again supports our conjecture that the optimal orer for the graph iffusion similarity is positively correlate to the feature sparsity. 5.3 ImageNet Dataset The performance of the graph iffusion similarities are also teste on the ImageNet ataset [8]. We aopt the GoogLeNet moel traine on 1.2 million images for a classification problem for 1,000 classes [32]. Again, we extract the last hien layer, which contains 1,024 hien noes of non-negative values for each image. For better visualization, we ranomly pick r classes out of the 1,000 ifferent classes an calculate the corresponing error curves. We repeat this for 200 times an compute the average error curves. The case r = 2 an r = 5 are emonstrate in Figure 5. 14

15 Figure 4: (Left) Error curves for the competitor label; (right) error curves for the prouct label. 15

16 Figure 5: Error curves of (upper) r = 2 an (lower) r = 5 for GoogleNet ataset. 16

17 Again, compare to traitional similarities, the graph iffusion similarity family performs better in terms of istinguishing ifferent classes. For the case of r = 5, the orer 2 curve performs the best, while for the case of r = 2, we observe ifferent behaviors. The performances of the curve for orer 2 to 6 are almost ientical, an suenly the orer 7 curve becomes the constant 0.5, which inicates the similarity has homogenize for any pair of objects. 6 Discussion an Future Work A key observation in the last section was that when increasing the number of iterations, the performance of the graph iffusion similarity usually improves at first an then becomes worse. The optimal number of iterations for sparse representations like the tf-if is typically larger (4 to 5) an for ense embeings it is smaller (2). For structure ata sets in Section 4, the same phenomenon is observe. We have observe similar phenomenon in other ata sets which, ue to the page limit, we o not emonstrate here. Purely from the perspective of graph theory, object-feature sparsity means that it takes more iterations for mass to iffuse from one object to another, which makes graph iffusion similarity less ieal with small iterations. We conjecture that this phenomenon shoul be common for all kins of ata sets, an call for future work on builing an explicit relation between the sparsity of the feature vectors an the optimal number of iterations. 17

18 References [1] Arthur Asuncion an Davi Newman. Uci machine learning repository, [2] Aam Berger, Rich Caruana, Davi Cohn, Dayne Freitag, an Vibhu Mittal. Briging the lexical chasm: statistical approaches to answerfining. In Proceeings of the 23r annual international ACM SIGIR conference on Research an evelopment in information retrieval, pages ACM, [3] Anil Bhattacharyya. On a measure of ivergence between two multinomial populations. Sankhyā: the inian journal of statistics, pages , [4] Shyam Boriah, Varun Chanola, an Vipin Kumar. Similarity measures for categorical ata: A comparative evaluation. In Proceeings of the 2008 SIAM International Conference on Data Mining, pages SIAM, [5] Lev M Bregman. The relaxation metho of fining the common point of convex sets an its application to the solution of problems in convex programming. USSR computational mathematics an mathematical physics, 7(3): , [6] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, an Pavel Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug): , [7] Roy De Maesschalck, Delphine Jouan-Rimbau, an Désiré L Massart. The mahalanobis istance. Chemometrics an intelligent laboratory systems, 50(1):1 18, [8] Jia Deng, Wei Dong, Richar Socher, Li-Jia Li, Kai Li, an Li Fei-Fei. Imagenet: A large-scale hierarchical image atabase. In Computer Vision an Pattern Recognition, CVPR IEEE Conference on, pages IEEE, [9] Usama Fayya an Keki Irani. Multi-interval iscretization of continuousvalue attributes for classification learning

19 [10] Amir Ganomi an Murtaza Haier. Beyon the hype: Big ata concepts, methos, an analytics. International Journal of Information Management, 35(2): , [11] A Areshir Goshtasby. Similarity an issimilarity measures. Springer, [12] Richar W Hamming. Error etecting an error correcting coes. Bell Labs Technical Journal, 29(2): , [13] William P Jones an George W Furnas. Pictures of relevance: A geometric analysis of similarity measures. Journal of the American society for information science, 38(6):420, [14] Yoon Kim. Convolutional neural networks for sentence classification. arxiv preprint arxiv: , [15] Solomon Kullback an Richar A Leibler. On information an sufficiency. The annals of mathematical statistics, 22(1):79 86, [16] Quoc V Le an Tomas Mikolov. Distribute representations of sentences an ocuments. In ICML, volume 14, pages , [17] Lucien Le Cam an Grace Lo Yang. Asymptotics in statistics: some basic concepts. Springer Science & Business Meia, [18] LeningClub. Lening club corporation, [19] Anrew L. Maas, Raymon E. Daly, Peter T. Pham, Dan Huang, Anrew Y. Ng, an Christopher Potts. Learning wor vectors for sentiment analysis. In Proceeings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages , Portlan, Oregon, USA, June Association for Computational Linguistics. [20] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrao, an Jeff Dean. Distribute representations of wors an phrases an their compositionality. In Avances in neural information processing systems, pages ,

20 [21] Dimitrios Milioris an Philippe Jacquet. Joint sequence complexity analysis: Application to social networks information flow. Bell Labs Technical Journal, 18(4):75 88, [22] James R Norris. Markov chains. Number 2. Cambrige university press, [23] Pascal Pons an Matthieu Latapy. Computing communities in large networks using ranom walks. In International Symposium on Computer an Information Sciences, pages Springer, [24] Prosper. Prosper marketplace inc., [25] Juan Ramos et al. Using tf-if to etermine wor relevance in ocument queries. In Proceeings of the first instructional conference on machine learning, [26] Konra Rieck. Similarity measures for sequential ata. Wiley Interisciplinary Reviews: Data Mining an Knowlege Discovery, 1(4): , [27] Konra Rieck an Pavel Laskov. Linear-time computation of similarity measures for sequential ata. Journal of Machine Learning Research, 9(Jan):23 48, [28] Konra Rieck, Pavel Laskov, an Sören Sonnenburg. Computation of similarity measures for sequential ata using generalize suffix trees. Avances in neural information processing systems, 19:1177, [29] Martin Rosvall an Carl T Bergstrom. Maps of ranom walks on complex networks reveal community structure. Proceeings of the National Acaemy of Sciences, 105(4): , [30] Gerar Salton an Michael J McGill. Introuction to moern information retrieval [31] Gerar Salton, Anita Wong, an Chung-Shu Yang. A vector space moel for automatic inexing. Communications of the ACM, 18(11): ,

21 [32] Christian Szegey, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Ree, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, an Anrew Rabinovich. Going eeper with convolutions. In Proceeings of the IEEE Conference on Computer Vision an Pattern Recognition, pages 1 9, [33] Kai Ming Ting, Ye Zhu, Mark Carman, Yue Zhu, an Zhi-Hua Zhou. Overcoming key weaknesses of istance-base neighbourhoo methos using a ata epenent issimilarity measure. In Proceeings of the 22n ACM SIGKDD International Conference on Knowlege Discovery an Data Mining, pages ACM, [34] Joseph Turian, Lev Ratinov, an Yoshua Bengio. Wor representations: a simple an general metho for semi-supervise learning. In Proceeings of the 48th annual meeting of the association for computational linguistics, pages Association for Computational Linguistics, [35] Davi H Wolpert. The supervise learning no-free-lunch theorems. In Soft computing an inustry, pages Springer,

22 Appenix: proofs of Theorem 1 an Theorem 2 The etaile proofs for Theorem 1 an Theorem 2 are shown as follows. Recall that an explicit formula for g (1) (i, j) is written as g (1) (i, j) = m s=1 w is w i1 + + w im w js = 1 w 1s + + w ns p ii m s=1 w is w js q ss. (5) Basic Properties We begin with some basic properties of the graph iffusion similarity an its variants. It is clear that 0 g (k) (i, j) = 1 g(k) (i, j) 1 since g (k) (i, j) is a (transition) probability or transferre mass. Further, we have Proposition 6.1. If g (k) (i, j) = 0, then i = j an object i is isolate from the rest of the objects. If g (k) (i, j) = 1 for any k, then object j can not be reache by object i in G. Proof. If g (k) (i, j) = 0, then g(k) (i, j) = 1, then all the probability measure at i is transferre to j after k rouns. If i j, then there shoul be a feature s such that w is > 0 in orer for the probability starting at i to propagate to j via a path. However, w is > 0 means that i is also connecte to itself an thus there will always be a positive probability at i, which implies g (k) (i, j) > 0. Thus, by contraiction, i = j an for any s such that w is > 0, w ls = 0 for l i, hence i is isolate. On the other han, if g (k) (i, j) = 1 for all k, then there is zero probability transferre from i to j in any steps, therefore there is no path from i to j in G. Notice that g (k) (i, i) is not necessarily 0, ue to ispersion of mass to other noes through common features. As for symmetry in the graph iffusion similarity, it is clear from (5) that g (1) (i, j) is not symmetric in general. However, we have Proposition 6.2. A sufficient conition for G (1) to be symmetric is that p ii are the same for all i. If the bipartite graph is connecte, then the conition that all the p ii are the same is also necessary for G (1) to be symmetric. Proof. The first part is straightforwar by (5). For the secon part of the statement, notice that for any pair (i, j), there shoul be a connecte path i, k,..., j, an i being connecte to k means w i1 w k1 + + w im w km > 0 an 22

23 thus p ii = p kk accoring to (5). The same argument hols for any consecutive objects in the path between i an j, which leas to the conclusion that p ii = p jj. The Triangle Inequality In this section, we assume p ii is a constant for all i which coul be achieve via scaling. We will show that this conition ensures the resulting graph iffusion istance will be a metametric. It is clear that the normalize graph iffusion istance satisfies this conition, since it normalizes the feature weights before calculating similarity. Further, for istributional ata an categorical ata this conition always hols since for istributions p ii = 1, an for categorical ata, p ii equals the number of categories. Therefore, the forwar, reverse, an normalize variants of the graph iffusion istance are ientical when applie to istributions or categorical ata. In the following analysis, we will use n (k) (, ) for concreteness, an the proof for istributions an categorical ata irectly follows. We first consier the case when the number of iterations is 1. In such a case, we will prove: Proposition 6.3. For any row-stochastic matrix W an its column-sum iagonal matrix Q, efine matrix D = ( ij ) as D := 11 T W Q 1 W T. Then D is a symmetric matrix, an the triangular inequality ij + jk ik hols for any 1 i, j, k n. Proof. Base on the efinition of D, we have m w ik w jk ij = 1, (6) q kk k=1 an thus the symmetry of D follows immeiately. For the triangle inequality, without loss of generality, we only nee to prove that , which is expane as m (w 1k + w 3k )w 2k w 1k w 3k 1. (7) q kk k=1 Notice that w 1k w 3k 0 an q kk w 1k + w 2k + w 3k, then it is sufficient to prove that m (w 1k + w 3k )w 2k 1. w 1k + w 2k + w 3k k=1 23

24 It is easy to check xy x + y 1 9 x y hols for any x an y given x + y > 0, since it is equivalent to (x 2y) 2 0. By letting x = w 1k + w 3k an y = w 2k, we have m k=1 (w 1k + w 3k )w 2k w 1k + w 2k + w 3k 1 9 m (w 1k + w 3k ) k=1 m w 2k = 2 3 < 1, k=1 which completes the proof of triangle inequality, an thus the proposition. We note that the coefficient 2/3 in the above is tight in the sense that there exists a construction of W such that all the above inequalities become equality. The construction is as follows. Let w 11 = 1, w 12 = 0, w 21 = w 22 = 1/2, w 31 = 0, w 32 = 1. Let w 1k = w 2k = w 3k = 0 for k 3. For all r 4, let w r1 = w r2 = 0. Set w rk to any non-negative value such that w r3 + + w rm = 1. It is clear that W uner the above construction is row-stochastic. Besies, g (1) (1, 2) = g(1) (2, 3) = 1/3 an g(1) (1, 3) = 0, an hence g(1) (1, 2)+g(1) (2, 3) g (1) (1, 3) = 2/3. Next we consier the triangle inequality for general orer r. We will prove that: Proposition 6.4. For any row-stochastic matrix W with its column-sum iagonal matrix Q, efine matrix D (r) := ( (r) ij ) as D(r) := 11 T (W Q 1 W T ) r. Then the triangular inequality (r) ij an any positive integer r. + (r) jk (r) ik hols for any 1 i, j, k n Proof. The statement for r = 1 is prove in Proposition 6.3. We will consier the case r = 2u (an even number) an the case r = 2u + 1 (an o number) separately. For the case r = 2u, enote W = (W Q 1 W T ) u, then W is symmetric. Since both W an Q 1 W T are row-stochastic, W is actually oubly-stochastic, an D (r) = 11 T W W T. Since W is row-stochastic, then so is W = W Q 1 W T, thus the corresponing column-sum iagonal matrix Q 24

25 for W becomes an ientity matrix. By regaring W as a new feature matrix W in Proposition 6.3, the triangular inequality follows immeiately. For the case r = 2u + 1, let W = W W, then D (r) = 11 T W W Q 1 W T W T = 11 T W Q 1 W T. (8) Recalling Proposition 6.3, we only nee to show that W is a row-stochastic matrix an Q is its column-sum matrix. Since both W an W are rowstochastic matrices, so is W = W W. For its column-sum, since W is oublystochastic, W T 1 = W T WT 1 = W T 1, thus W an W share the same column-sum matrix Q, which completes the proof. Theorem 1 follows irectly from Proposition 6.1, Proposition 6.2, Proposition 6.3, an Proposition 6.4. Analysis of g (k) (, ) an r(k) (, ) For the forwar graph iffusion istance g (k) (, ) an its reverse version r (k) (, ), symmetry is no longer guarantee. Besies, triangle inequality nee not hol in general. We aim to fin sufficient conitions for these istances to be at least quasi-metametrics. A counter-example to the triangle inequality is provie by these 3 objects an 2 features: W = [1, 0; 2, 6; 0, 12]. It is straightforwar to check that g (1) (1, 2)+g(1) (2, 3) g(1) (1, 3) = 1/3+1/2 1 < 0, an also r (1) (3, 2) + r(1) (2, 1) g(1) (3, 1) = 1/3 + 1/2 1 < 0. The reason for failure of the triangle inequality is that ifferent features have istinct total sums of features p ii. Now we are in a position to prove Theorem 2: Proof. Base on the iscussion in Section 6, we only nee to prove the triangle inequality. Again, we only nee to prove g (1) (1, 2) + g(1) (2, 3) g(1) (1, 3) without loss of generality. Following the proof in Proposition 6.3, it suffices to show that m k=1 (w 1k /p 11 + w 3k /p 22 )w 2k w 1k w 3k /p 11 q kk 1. (9) 25

26 Following the same argument in the proof of Proposition 6.3, we have m k=1 m k=1 (w 1k /p 11 + w 3k /p 22 )w 2k w 1k w 3k /p 11 q kk (w 1k /p 11 + w 3k /p 22 )w 2k 2 max p ii = 1, w 1k + w 2k + w 3k 3 min p ii which completes the proof of Theorem 2. 26

Similarity Measures for Categorical Data A Comparative Study. Technical Report

Similarity Measures for Categorical Data A Comparative Study. Technical Report Similarity Measures for Categorical Data A Comparative Stuy Technical Report Department of Computer Science an Engineering University of Minnesota 4-92 EECS Builing 200 Union Street SE Minneapolis, MN

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim

More information

Chromatic number for a generalization of Cartesian product graphs

Chromatic number for a generalization of Cartesian product graphs Chromatic number for a generalization of Cartesian prouct graphs Daniel Král Douglas B. West Abstract Let G be a class of graphs. The -fol gri over G, enote G, is the family of graphs obtaine from -imensional

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Ramsey numbers of some bipartite graphs versus complete graphs

Ramsey numbers of some bipartite graphs versus complete graphs Ramsey numbers of some bipartite graphs versus complete graphs Tao Jiang, Michael Salerno Miami University, Oxfor, OH 45056, USA Abstract. The Ramsey number r(h, K n ) is the smallest positive integer

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations Diophantine Approximations: Examining the Farey Process an its Metho on Proucing Best Approximations Kelly Bowen Introuction When a person hears the phrase irrational number, one oes not think of anything

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Quantum Search on the Spatial Grid

Quantum Search on the Spatial Grid Quantum Search on the Spatial Gri Matthew D. Falk MIT 2012, 550 Memorial Drive, Cambrige, MA 02139 (Date: December 11, 2012) This paper explores Quantum Search on the two imensional spatial gri. Recent

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

A simple model for the small-strain behaviour of soils

A simple model for the small-strain behaviour of soils A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

WESD - Weighted Spectral Distance for Measuring Shape Dissimilarity

WESD - Weighted Spectral Distance for Measuring Shape Dissimilarity 1 WESD - Weighte Spectral Distance for Measuring Shape Dissimilarity Ener Konukoglu, Ben Glocker, Antonio Criminisi an Kilian M. Pohl Abstract This article presents a new istance for measuring shape issimilarity

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, an Tony Wu Abstract Popular proucts often have thousans of reviews that contain far too much information for customers to igest. Our goal for the

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Iterated Point-Line Configurations Grow Doubly-Exponentially

Iterated Point-Line Configurations Grow Doubly-Exponentially Iterate Point-Line Configurations Grow Doubly-Exponentially Joshua Cooper an Mark Walters July 9, 008 Abstract Begin with a set of four points in the real plane in general position. A to this collection

More information

Database-friendly Random Projections

Database-friendly Random Projections Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners Lower Bouns for Local Monotonicity Reconstruction from Transitive-Closure Spanners Arnab Bhattacharyya Elena Grigorescu Mahav Jha Kyomin Jung Sofya Raskhonikova Davi P. Wooruff Abstract Given a irecte

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Technion - Computer Science Department - M.Sc. Thesis MSC Constrained Codes for Two-Dimensional Channels.

Technion - Computer Science Department - M.Sc. Thesis MSC Constrained Codes for Two-Dimensional Channels. Technion - Computer Science Department - M.Sc. Thesis MSC-2006- - 2006 Constraine Coes for Two-Dimensional Channels Keren Censor Technion - Computer Science Department - M.Sc. Thesis MSC-2006- - 2006 Technion

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Calculus of Variations

Calculus of Variations Calculus of Variations Lagrangian formalism is the main tool of theoretical classical mechanics. Calculus of Variations is a part of Mathematics which Lagrangian formalism is base on. In this section,

More information

Generalization of the persistent random walk to dimensions greater than 1

Generalization of the persistent random walk to dimensions greater than 1 PHYSICAL REVIEW E VOLUME 58, NUMBER 6 DECEMBER 1998 Generalization of the persistent ranom walk to imensions greater than 1 Marián Boguñá, Josep M. Porrà, an Jaume Masoliver Departament e Física Fonamental,

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

THE GENUINE OMEGA-REGULAR UNITARY DUAL OF THE METAPLECTIC GROUP

THE GENUINE OMEGA-REGULAR UNITARY DUAL OF THE METAPLECTIC GROUP THE GENUINE OMEGA-REGULAR UNITARY DUAL OF THE METAPLECTIC GROUP ALESSANDRA PANTANO, ANNEGRET PAUL, AND SUSANA A. SALAMANCA-RIBA Abstract. We classify all genuine unitary representations of the metaplectic

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

On the Surprising Behavior of Distance Metrics in High Dimensional Space

On the Surprising Behavior of Distance Metrics in High Dimensional Space On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com

More information

Markov Chains in Continuous Time

Markov Chains in Continuous Time Chapter 23 Markov Chains in Continuous Time Previously we looke at Markov chains, where the transitions betweenstatesoccurreatspecifietime- steps. That it, we mae time (a continuous variable) avance in

More information

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation A Novel ecouple Iterative Metho for eep-submicron MOSFET RF Circuit Simulation CHUAN-SHENG WANG an YIMING LI epartment of Mathematics, National Tsing Hua University, National Nano evice Laboratories, an

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Fast image compression using matrix K-L transform

Fast image compression using matrix K-L transform Fast image compression using matrix K-L transform Daoqiang Zhang, Songcan Chen * Department of Computer Science an Engineering, Naning University of Aeronautics & Astronautics, Naning 2006, P.R. China.

More information

Computing Derivatives

Computing Derivatives Chapter 2 Computing Derivatives 2.1 Elementary erivative rules Motivating Questions In this section, we strive to unerstan the ieas generate by the following important questions: What are alternate notations

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Calculus in the AP Physics C Course The Derivative

Calculus in the AP Physics C Course The Derivative Limits an Derivatives Calculus in the AP Physics C Course The Derivative In physics, the ieas of the rate change of a quantity (along with the slope of a tangent line) an the area uner a curve are essential.

More information

New Statistical Test for Quality Control in High Dimension Data Set

New Statistical Test for Quality Control in High Dimension Data Set International Journal of Applie Engineering Research ISSN 973-456 Volume, Number 6 (7) pp. 64-649 New Statistical Test for Quality Control in High Dimension Data Set Shamshuritawati Sharif, Suzilah Ismail

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

On the Aloha throughput-fairness tradeoff

On the Aloha throughput-fairness tradeoff On the Aloha throughput-fairness traeoff 1 Nan Xie, Member, IEEE, an Steven Weber, Senior Member, IEEE Abstract arxiv:1605.01557v1 [cs.it] 5 May 2016 A well-known inner boun of the stability region of

More information

Variable Independence and Resolution Paths for Quantified Boolean Formulas

Variable Independence and Resolution Paths for Quantified Boolean Formulas Variable Inepenence an Resolution Paths for Quantifie Boolean Formulas Allen Van Geler http://www.cse.ucsc.eu/ avg University of California, Santa Cruz Abstract. Variable inepenence in quantifie boolean

More information

Vectors in two dimensions

Vectors in two dimensions Vectors in two imensions Until now, we have been working in one imension only The main reason for this is to become familiar with the main physical ieas like Newton s secon law, without the aitional complication

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10 Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Lecture 10 Notes, Electromagnetic Theory II Dr. Christopher S. Baird, faculty.uml.edu/cbaird University of Massachusetts Lowell

Lecture 10 Notes, Electromagnetic Theory II Dr. Christopher S. Baird, faculty.uml.edu/cbaird University of Massachusetts Lowell Lecture 10 Notes, Electromagnetic Theory II Dr. Christopher S. Bair, faculty.uml.eu/cbair University of Massachusetts Lowell 1. Pre-Einstein Relativity - Einstein i not invent the concept of relativity,

More information

Limitations of One-Hidden-Layer Perceptron Networks

Limitations of One-Hidden-Layer Perceptron Networks J. Yaghob (E.): ITAT 2015 pp. 167 171 Charles University in Prague, Prague, 2015 Limitations of One-Hien-Layer Perceptron Networks Věra Kůrková Institute of Computer Science, Czech Acaemy of Sciences,

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

What s in an Attribute? Consequences for the Least Common Subsumer

What s in an Attribute? Consequences for the Least Common Subsumer What s in an Attribute? Consequences for the Least Common Subsumer Ralf Küsters LuFG Theoretical Computer Science RWTH Aachen Ahornstraße 55 52074 Aachen Germany kuesters@informatik.rwth-aachen.e Alex

More information

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation

More information

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016 Noname manuscript No. (will be inserte by the eitor) Scaling properties of the number of ranom sequential asorption iterations neee to generate saturate ranom packing arxiv:607.06668v2 [con-mat.stat-mech]

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

arxiv: v2 [math.st] 29 Oct 2015

arxiv: v2 [math.st] 29 Oct 2015 EXPONENTIAL RANDOM SIMPLICIAL COMPLEXES KONSTANTIN ZUEV, OR EISENBERG, AND DMITRI KRIOUKOV arxiv:1502.05032v2 [math.st] 29 Oct 2015 Abstract. Exponential ranom graph moels have attracte significant research

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

arxiv:hep-th/ v1 3 Feb 1993

arxiv:hep-th/ v1 3 Feb 1993 NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,

More information

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

Axiometrics: Axioms of Information Retrieval Effectiveness Metrics

Axiometrics: Axioms of Information Retrieval Effectiveness Metrics Axiometrics: Axioms of Information Retrieval Effectiveness Metrics ABSTRACT Ey Maalena Department of Maths Computer Science University of Uine Uine, Italy ey.maalena@uniu.it There are literally ozens most

More information

Counting Lattice Points in Polytopes: The Ehrhart Theory

Counting Lattice Points in Polytopes: The Ehrhart Theory 3 Counting Lattice Points in Polytopes: The Ehrhart Theory Ubi materia, ibi geometria. Johannes Kepler (1571 1630) Given the profusion of examples that gave rise to the polynomial behavior of the integer-point

More information

Stable and compact finite difference schemes

Stable and compact finite difference schemes Center for Turbulence Research Annual Research Briefs 2006 2 Stable an compact finite ifference schemes By K. Mattsson, M. Svär AND M. Shoeybi. Motivation an objectives Compact secon erivatives have long

More information

Two formulas for the Euler ϕ-function

Two formulas for the Euler ϕ-function Two formulas for the Euler ϕ-function Robert Frieman A multiplication formula for ϕ(n) The first formula we want to prove is the following: Theorem 1. If n 1 an n 2 are relatively prime positive integers,

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

The canonical controllers and regular interconnection

The canonical controllers and regular interconnection Systems & Control Letters ( www.elsevier.com/locate/sysconle The canonical controllers an regular interconnection A.A. Julius a,, J.C. Willems b, M.N. Belur c, H.L. Trentelman a Department of Applie Mathematics,

More information

Crowdsourced Judgement Elicitation with Endogenous Proficiency

Crowdsourced Judgement Elicitation with Endogenous Proficiency Crowsource Jugement Elicitation with Enogenous Proficiency Anirban Dasgupta Arpita Ghosh Abstract Crowsourcing is now wiely use to replace jugement or evaluation by an expert authority with an aggregate

More information

Short Intro to Coordinate Transformation

Short Intro to Coordinate Transformation Short Intro to Coorinate Transformation 1 A Vector A vector can basically be seen as an arrow in space pointing in a specific irection with a specific length. The following problem arises: How o we represent

More information

arxiv: v1 [math.mg] 10 Apr 2018

arxiv: v1 [math.mg] 10 Apr 2018 ON THE VOLUME BOUND IN THE DVORETZKY ROGERS LEMMA FERENC FODOR, MÁRTON NASZÓDI, AND TAMÁS ZARNÓCZ arxiv:1804.03444v1 [math.mg] 10 Apr 2018 Abstract. The classical Dvoretzky Rogers lemma provies a eterministic

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

In the usual geometric derivation of Bragg s Law one assumes that crystalline

In the usual geometric derivation of Bragg s Law one assumes that crystalline Diffraction Principles In the usual geometric erivation of ragg s Law one assumes that crystalline arrays of atoms iffract X-rays just as the regularly etche lines of a grating iffract light. While this

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information