Incremental Commute Time Using Random Walks and Online Anomaly Detection

Size: px
Start display at page:

Download "Incremental Commute Time Using Random Walks and Online Anomaly Detection"

Transcription

1 Incremental Commute Time Using Random Walks and Online Anomaly Detection Nguyen Lu Dang Khoa 1 and Sanjay Chawla 2,3 1 Data61, CSIRO, Australia khoa.nguyen@data61.csiro.au 2 Qatar Computing Research Institute, HBKU 3 University of Sydney, Australia sanjay.chawla@sydney.edu.au Abstract. Commute time is a random walk based metric on graphs and has found widespread successful applications in many application domains. However, the computation the commute time is expensive, involving the eigen decomposition of the graph Laplacian matrix. There has been effort to approximate the commute time in offline mode. Our interest is inspired by the use of commute time in online mode. We propose an accurate and efficient approximation for computing the commute time in an incremental fashion in order to facilitate real-time applications. An online anomaly detection technique is designed where the commute time of each new arriving data point to any point in the current graph can be estimated in constant time ensuring a real-time response. The proposed approach shows its high accuracy and efficiency in many synthetic and real datasets and takes only 8 milliseconds on average to detect anomalies online on the DBLP graph which has more than 600,000 nodes and 2 millions edges. Keywords: Commute time, random walk, incremental learning, online anomaly detection 1 Introduction Commute time is a well-known measure derived from random walks on graphs [10]. The commute time between two nodes i and j in a graph is the expected number of steps that a random walk, starting from i will take to visit j and then come back to i for the first time. Commute time has been used as a robust metric for different learning tasks such as clustering [14] and anomaly detection [8]. It has also found widespread applications in personalized search [16], collaborative filtering [3] and image segmentation [14]. The fact that the commute time is averaged over all paths (and not just the shortest path) makes it more robust to data perturbations. More advanced measures generally require more expensive computation. Estimating commute time involves the eigen decomposition of the graph Laplacian matrix and resulting in an O(n 3 ) time complexity which is impractical for

2 2 Khoa and Chawla large graphs. Saerens, Pirotte and Fouss [15] used subspace approximation to approximate the commute time. Sarkar and Moore [13] introduced a notion of truncated commute time and a pruning algorithm to find nearest neighbors in the truncated commute time. Recently, Spielman and Srivastava [17] proposed an approximation algorithm to create a structure in nearly linear time so that the pairwise commute time can be approximated in O(log n) time. However, all the above-mentioned approximation techniques all work in a batch fashion and therefore have a high computation cost for online applications. We are interested in the following scenarios: a dataset or a graph D is given from an underlying domain of interest such as data from a network traffic log or a social network graph. A new data point p arrives and we want to determine if p is an anomaly with respect to D in commute time. A data point is an anomaly if it is far away from its nearest neighbors in commute time measure (as described in [8]). This particular application requires the computation of commute time in an online fashion. In this paper, we propose a method called iect to incrementally estimate the commute time and use it to design an online anomaly detection application. The method makes use of the recursive definition of commute time in terms of random walk measures. The commute time from a new data point to any data point in the existing data D is computed based on the current commute times among points in D. The method is novel and reveals insights about commute time which are independent of the applications. The contributions of this paper are as follows: We use characteristics of random walk measures to propose a method to estimate the commute time incrementally in constant time. Then we design an online anomaly detection technique using the incremental commute time. To the best of our knowledge, this is the first method to estimate the commute time in an online fashion. The proposed technique is verified by experiments in different applications using several synthetic and real datasets. The experiments show the effectiveness of the proposed methods in terms of accuracy and performance. The methods can be applied directly to graph data and can be used in any application that utilizes the commute time (e.g. classification and graph ranking using commute time). The remainder of the paper is organized as follows. Section 2 reviews notations and concepts related to random walks and commute time and a method to approximate the commute time offline in large graphs. Section 3 presents a simple motivation example to tie up all the definitions and ideas, and proposes a method to incrementally estimate the commute time. In Section 4, we propose an online anomaly detection algorithm which uses the incremental commute time. We evaluate our approaches using experiments on synthetic and real datasets in Section 5. Sections 6 and 7 cover the related work and a summary of our work.

3 Incremental Commute Time Using Random Walks 3 2 Background 2.1 Random Walks on Graphs and Commute Time We provide a self-contained introduction to random walks with an emphasis on commute time. Assume we are given a connected undirected and weighted graph G = (V, E, W ). Definition 1. Let i be a node in G and N(i) be its neighbors. The degree d i of a node i is j N(i) w ij. The volume V G of the graph is defined as i V d i. Definition 2. The transition matrix M = (p ij ) i,j V of a random walk on G is given by { wij p ij = d i, if (i, j) E 0, otherwise Definition 3. The Hitting Time h ij is the expected number of steps that a random walk starting at i will take before reaching j for the first time. Definition 4. The Hitting Time can be defined in terms of the recursion { 1 + l N(i) h ij = p ilh if i j 0 otherwise Definition 5. The Commute Time c ij between two nodes i and j is given by c ij = h ij + h ji. Fact 1 The commute time can be expressed in terms of the Laplacian of G. c ij = V G (l + ii + l+ jj 2l+ ij ) = V G(e i e j ) T L + (e i e j ) (1) where l + ij is the (i, j) element of L+ (the pseudo-inverse of the Laplacian L) and e i is the V dimensional column vector with 1 at location i and zero elsewhere [3]. L + can be computed from the eigensystem of L: L + = V i=2 1 λ i v i v T i. 2.2 Approximation of Commute Time Embedding (Batch Mode) Computing commute time involves the eigen decomposition of the graph Laplacian matrix which is impractical for large graphs. Recently, Spielman and Srivastava [17] proposed an approximation algorithm utilizing random projection and a SDD solver to create a structure in nearly linear time so that the pairwise commute time can be approximated in k RP = O(log n) time (k RP is the reduced dimension in random projection). The fast SDD solver [18] for linear systems is a new class of near-linear time methods for solving a system of equations Ax = b when A is a symmetric diagonally dominant (SDD) matrix. The idea is based on the fact that θ = V G L + B T W 1/2 is a commute time embedding where the commute time c ij is a squared Euclidean distance between points i and j in θ. Here m be the number of edges in G, B is a signed edgevertex incidence matrix and W is a diagonal matrix whose entries are the edge weights. For the details of the embedding creation, refer to [17].

4 4 Khoa and Chawla 3 Incremental Commute Time 3.1 Problem and Scope Problem: Given a dataset or a graph D from an underlying domain of interest. When a new data instance p comes in, we want to compute the commute time from p to any data instance in D. In an Euclidean space, an insertion of a new point does not change the features of existing points. However, an insertion of a new node in an original feature space or a graph will change the features of existing points in the commute time embedding space, which is spanned by eigenvectors of the graph Laplacian matrix. Updating an eigensystem of a graph Laplacian is costly and not suitable for online applications. In this work, we use the characteristics of random walk measures to estimate the commute time incrementally in constant time and use it to design online applications. There are some notes regarding the scope of this work. Firstly, the proposed method is only suitable for applications which do not need to update the training model overtime (i.e. a representative training data are available). That means we treat the new data one by one, estimate its corresponding commute time and leave the trained model intact. Secondly, in case of graph data, we only deal with the case of node insertion, not node deletion or weight update. 3.2 Motivation Examples Consider a graph G shown in Figure 1a where all the edge weights equal 1. The sum of the degree of nodes, V G = 8. We will calculate the commute time c 12 in two different ways: (a) 4-node graph (b) Adding node 5 Fig. 1: c 12 increases after an addition of node 5 even though the shortest path distance remains unchanged. 1. Using random walk approach: note that the expected number of steps for a random walk starting at node 1 and returning back to it is V G d1 = 8 1 = 8 [10]. But the walk from node 1 can only go to node 2 and then return from node 2 to 1. Thus c 12 = Using algebraic approach: the Laplacian matrix is L =

5 Incremental Commute Time Using Random Walks 5 and the pseudo-inverse is L + = Since c 12 = V G (e 1 e 2 ) T L + (e 1 e 2 ) and (e 1 e 2 ) T L + (e 1 e 2 ) = = 1, T c 12 = V G 1 = 8. Suppose we add a new node (labeled 5) to node 4 with a unit weight as in Figure 1b. Then c new 12 = VG new/d 1 = 10/1 = 10. The example in Figure 1b shows that by adding an edge, i.e. making the cluster which contains node 2 denser, c 12 increases. This shows that commute time between two nodes captures not only the distance between them (as measured by the edge weights) but also the data densities. For the proof of this claim, see [8]. This property of commute time has been used to simultaneously discover global and local anomalies in data - an important problem in the anomaly detection literature. In the above example, we exploited the specific topology (degree one node) of the graph to calculate the commute time efficiently. This can only work for very specific instances. The general, more widely used but slower approach for computing the commute time is to use the Laplacian formula as in Equation 1. One key contribution of this paper is that for an incremental computation of commute time we can use insights from this example to efficiently approximate the commute time using random walk in much more general situations. 3.3 Incremental Estimation of Commute Time In this section, we derive a new method for computing the commute time in an incremental fashion. This method uses the definition of commute time based on the hitting time. The basic intuition is to expand the hitting time recursion until the random walk has moved a few steps away from the new node and then use the old values. In Section 5 we will show that this method results in remarkable agreement between the batch and online modes. We deal with two cases shown in Figure Rank one perturbation corresponds to the situation when a new node connects with one other node in the existing graph. 2. Rank k perturbation deals with the situation when the new node has k neighbors in the existing graph.

6 6 Khoa and Chawla (a) Rank 1 (b) Rank k Fig. 2: Rank 1 and rank k perturbation when a new data point arrives. Rank one perturbation Proposition 1. Let i be a new node connected by one edge to an existing node l in the graph G. Let w il be the weight of the new edge. Let j be an arbitrary node in the graph G. Then c ij = c old + V G w il + O( 1 k ) (2) where old represents the commute time in graph G (k nearest neighbor graph) before adding i. Proof. (Sketch) Since the random walk needs to pass l before reaching j, the commute distance from i to j is: It is known that: c ij = c il + c. (3) c il = (V G + 2w il ) w il (4) where V G is volume of graph G [8]. We also know c = h jl + h and h jl = h old jl. The only unknown factor is h. By definition: h = 1 + p lq h qj = 1 + p lq h qj + p li h ij. q N(l) q N(l),q i Since commute time is robust against small changes or perturbation in data, we have h qj h old qj. Moreover, p lq = (1 p li )p old lq, and h ij = 1 + h. Therefore, h 1 + (1 p li )p old lq h old qj + p li (1 + h ) q N(l),q i = 1 + (1 p li ) q N(l),q i p old lq h old qj + p li (1 + h ) = 1 + (1 p li )(h old 1) + p li (1 + h ). After simplification, h = h old + 2p li 1 p li. Then c h old jl + h old + 2p li 1 p li. Since there is only one edge connecting from i to G, i is likely an isolated point and thus p li = O( 1 k ) (G is the k nearest neighbor graph). Then c = h old jl + h old + O( 1 k ) = cold + O( 1 ). (5) k

7 Incremental Commute Time Using Random Walks 7 As a result from Equations 3, 4, and 5: c ij = (V G + 2w il ) w il + c old + O( 1 k ) = cold + V G w il + O( 1 k ) Rank k perturbation The rank k perturbation analysis is more involved but the final formulation is an extension of the rank one case. Proposition 2. Denote l G be one of k neighbors of i, and j be a node in G. The approximate commute time between nodes i and j is: c ij l N(i) p il c old + V G d i + O( 1 k ) (6) For the proof, see Appendix in the supplement document. When k = 1 (rank one case), the Equation 6 becomes Equation 2. 4 Online Applications Using Incremental Commute Time We return to our original motivation for computing incremental commute time. We are given a dataset D which is a representative of the underlying domain of interest. We need to find nearest neighbors of a new data point p in commute time metric incrementally. We want to check if p is an anomaly in D. We train the dataset D using Algorithm 1. First, a mutual k 1 -nearest neighbor graph is constructed from the dataset. This graph connects nodes u and v if u belongs to k 1 -nearest neighbors of v and v belongs to k 1 -nearest neighbors of u [11]. Then the approximate commute time embedding θ is computed as in Section 2.2. Finally, a distance-based anomaly detection with a pruning rule proposed by Bay and Schwabacher [2] is used in θ to find the top N anomalies. That means the distance-based method uses commute time, instead of Euclidean distance. It has been shown that a distance-based approach using commute time can be used to simultaneously identify global, local and even group anomalies in data [8]. The anomaly score used is the average commute time of a data instance to its k 2 nearest neighbors. Pruning Rule [2]: A data point is not an anomaly if its score (e.g. the average distance to its k nearest neighbors) is less than an anomaly threshold. The threshold can be fixed or be adjusted as the score of the weakest anomaly found so far. Using the pruning rule, many non-anomalies can be pruned without carrying out a full nearest neighbors search. After training, the corresponding graph G, the commute time embedding θ, and the anomaly threshold τ are obtained (τ is the score of the weakest anomaly found among top N anomalies). We propose a method shown in Algorithm 2 (denote as iect) to detect anomalies online given the trained model. When a new data point p arrives, it is connected to graph G created in the training phase so that the property of the mutual nearest neighbor graph is held. The commute times are incrementally updated to estimate the anomaly score

8 8 Khoa and Chawla Algorithm 1 Approximate Commute Time Distance Based Anomaly Detection (for training). Input: Data matrix X, the numbers of nearest neighbors k 1 (for building the k- nearest neighbor graph) and k 2 (for estimating the anomaly score), the number of random vectors k RP, the numbers of anomalies to return N Output: Top N anomalies, anomaly threshold τ 1: Construct a mutual k-nearest neighbor graph G from the dataset (using k 1) 2: Compute the approximate commute time embedding θ from G 3: Find top N anomalies using a distance-based technique with pruning rule described in [2] on θ (using k 2) 4: Return top N anomalies and the anomaly threshold τ Algorithm 2 Online Anomaly Detection using the incremental Estimation of Commute Time (iect) Input: Graph G, the approximate commute time embedding θ and the anomaly threshold τ computed in the training phase, and a new arriving data point p Output: Determine if p is an anomaly or not 1: Add p to G satisfying the property of the mutual nearest neighbor graph 2: Determine if p is an anomaly or not by estimating its anomaly score incrementally using the method described in Section 3.3. Use pruning rule with threshold τ to reduce the computation 3: Return whether p is an anomaly or not of p using the approach in Section 3.3. The embedding θ is used to compute the commute time c old. The pruning is used as follows: p is not anomaly if its average distance to k nearest neighbors is smaller than the anomaly threshold τ. Generally commute time is robust against small changes or perturbation in data. Therefore, only the anomaly score of a new data point needs to be estimated and be compared with the anomaly threshold computed in the training phase. This claim will be verified by experiments in Section Analysis The incremental estimation of commute time in Section 3.3 requires O(k RP ) for each query of c old in θ. So if there are k edges added to the graph due to the addition of a new node, it takes O(kk RP ) for each query of c ij. As explained earlier, we only need to compute the anomaly score of the new data point. Using pruning rule with the known anomaly threshold, it takes only O(k 2 ) nearest neighbor search to determine if the test point is an anomaly or not where k 2 is the number of nearest neighbors for estimating the anomaly score. For each commute time query, it takes O(kk RP ) as described above. Therefore, iect takes O(k 2 kk RP ) to determine if a new arriving point is an anomaly or not. [19] has suggested that k RP = 2 ln n/ which is just 442 for a dataset of

9 Incremental Commute Time Using Random Walks 9 a million data points. Therefore k RP n. Since k, k 2 n, O(k 2 kk RP ) = O(1) resulting in a near constant time complexity for iect. Note that this constant time complexity of iect does not depend on the complexity of O(k RP ) for each query of c old using the method in [17]. If we query c old using equation 1 with just O(k EV ) eigenvectors of Laplacian matrix L (as described in [8]), each query only takes O(k EV n) also resulting in a constant time complexity for iect. 5 Experiments and Results In this section, we determined and compared the effectiveness of online anomaly detection application using incremental commute time. The experiments were carried out on synthetic as well as real datasets. In all experiments, the numbers of nearest neighbors were k 1 = 10 (for building the nearest neighbor graph), k 2 = 20 (for estimating a nearest neighbor score or an anomaly score in anomaly detection applications), and the number of random vectors was k RP = 200 (for creating the commute time embedding) unless otherwise stated. We used Koutis s CMG solver [9] as an implementation of the SDD-Solver for creating the embedding. The solver is used for SDD matrices which is available online at The choice of parameters was determined from the experiments and it was also analyzed in Section 5.5. Source code and data can be assessed at 0B6LuuZJnvhFdTldkMmE1clk2T28/view?usp=sharing 5.1 Approach We split a dataset into two parts: a training set and a test set. We trained the training set to find top N anomalies and the threshold value τ using Algorithm 1. Then an anomaly score of each instance p in the test set was calculated based on its k 2 neighbors in the training set. If this score was greater than τ then the test instance was reported as an anomaly. During the time searching for the nearest neighbors of p, if its average distance to the nearest neighbors found so far is smaller than τ, we can stop the search as p is not anomaly (pruning rule). In practise, it is not trivial to know the amount of anomalies N in the training data so that we can find the top N and set the threshold for anomaly. We investigated a method to find the threshold as follows. In the training phase, we computed the anomaly scores of all the data points and we had the mean and standard deviation of all the scores. Anomalies were data points whose scores were greater than three times of standard deviation away from the mean score. N was the number of anomalies found. Baseline: in all experiments, the batch method (Algorithm 1) was used as the benchmark since there is no other method to estimate commute time incrementally. Note that for both the batch and incremental methods, we need to compute only the anomaly score of the new arriving data instance and pruning was also applied using τ. The difference is in the batch method, the new approximate commute time embedding was recomputed and the anomaly score was estimated using the new embedding space. The incremental method, on the

10 10 Khoa and Chawla other hand, estimated the score incrementally using the method described in Section Synthetic datasets We created six synthetic datasets with 1000, 10000, 20000, 30000, and data points. Each dataset contained several clusters generated from Normal distributions and 100 random points generated from uniform distribution which were likely anomalies. The number of clusters, the sizes, and the locations of the clusters were also chosen randomly. Each dataset was divided into a training set and a test set. There were 100 data points in every test set and half of them were random anomalies mentioned above. Experiments on Robustness: We first tested the robustness of commute time between nodes in an existing graph when a new node is introduced. As the commute time c ij is a measure of expected path distance, the hypothesis is that the addition of a new point will have minimal influence on c ij and thus the anomaly scores of data points in the existing set are relatively unchanged. Table 1 shows the average, standard deviation, minimum, and maximum of anomaly scores of points in graph G before and after a new data point was added to G. Graph G was created from the training set of a 1000 point dataset described above. The result was averaged over 100 test points in the test set. The result shows that the anomaly scores of data instances in G do not change much when a new point is added to G (the change of the average score was only about 0.7%). Table 1: Robustness of commute time. The anomaly scores of data instances in existing graph G are relatively unchanged when a new point is added to G. Average Std Min Max Without test point 15, , , With test point 15, , , In the following experiments, the change in eigensystem of the graph Laplacian L of the training data due to an addition of a new node was analyzed. Figures 3a shows average changes in the top 50 eigenvalues before and after an addition of each test point in the test set in the 1000 point dataset. The changes are small for most of them (most of them were less than 1% and all of them were less than 6%). Figures 3b shows dot products of eigenvectors with the second smallest eigenvalues (the smallest is zero) before and after an addition of each test point. The eigenvectors did not change much after we add a new node to the graph. As shown in Equation 1, since the change in eigensystem of the Laplacian is small, the commute times between existing training nodes do not change much. All these results show commute time is a robust measure: a small change or perturbation in the data will not result in large changes in commute times. Therefore, only the anomaly score of the new point needs to be estimated.

11 Incremental Commute Time Using Random Walks 11 Eigen value changes (before vs after) (%) Test index (a) Eigenvalue changes Dot products of second eigenvectors (before vs after) Test index (b) Eigenvector changes Fig. 3: Change in eigensystem when new nodes were added to the graph. Experiments on Effectiveness: We applied iect to all six datasets mentioned earlier. The effectiveness of iect and the commute time approximation were reported and discussed. Table 2 presents the results in accuracy and performance of iect in six synthetic datasets. Average score was the average anomaly score with pruning rule over 100 test points. The precision and recall were for the anomalous class. The time was the average time to process each of 100 test points. iect captured all the anomalies, had a few false alarms, and was much more efficient than the batch method. Note that the scores shown here were the anomaly scores with pruning rule and the scores for anomalies are always much higher than scores for normal points. Therefore the average scores shown in the table were dominated by the scores of anomalies. Table 2: Effectiveness of the incremental method. iect captured all the anomalies, had a few false alarms and was much more efficient than the batch method. Dataset iect Batch Size Precison (%) Recall (%) Avg score Time (s) Avg score Time (s) 1, , , , , , There is an interesting dynamic at play between the pruning rule and the number of anomalies in the data. The reason is there was a high proportion of anomalies in the test set (about 50%). We know that the pruning rule only works for non-anomalies and therefore, the time to process anomalies should be much longer than the others. Table 3 shows the details of time to process data

12 12 Khoa and Chawla points in the test set. For batch and iect methods, the average time to process only anomalies, only other data points (non-anomalies), and all data instances are listed in the table. There was not much difference in batch method between time to process anomalies and non-anomalies since for each new data point the time to create the new commute time embedding was much higher than that of the nearest neighbor search. On the other hand, this gap was very high for iect so that the times to process non-anomalies were much faster than those of anomalies. In practice, since most of the data points are not anomalies, iect is very efficient. Another cost we have not mentioned is the time to update the graph. That is the time to add a new data point to an existing graph satisfying the property of the mutual nearest neighbor graph. Since we stored the kd tree corresponding to the training data, the update cost was very low as shown in Table 3. Table 3: Performance of the incremental method. In iect, the times to process non-anomalies were much faster than those of anomalies. Dataset Graph update iect (s) Batch (s) Size Time (s) Anomaly Others All Anomaly Others All 1, , , , , , Graph Dataset In this section, we evaluated the iect method on a large DBLP co-authorship network to show the scalability of the method. In this graph, nodes are authors and edge weights are the number of collaborated papers between the authors. Since the graph is not fully connected, we extracted its biggest component. It has 612,949 nodes and 2,345,178 edges in a snapshot in December 5th, 2011 which is available at We randomly chose a test set of 50 nodes and removed them from the graph. We ensured that the graph remained connected. After training, each node was added back into the graph along with their associated edges. We trained the graph using Algorithm 1, stored the approximate embedding in order to query the c old in iect algorithm. The batch method use the approximate embedding created from a new graph after adding each test point. The result shows that it took seconds on average over 50 test data points to detect whether each test point was an anomaly or not. The batch method, which is the fastest approximation of commute time to date, required 1,454 seconds on average to process each test data point. This dramatically highlights the constant time complexity of iect algorithm and suggests that iect is highly suitable for the computation of commute time in an incremental fashion. Since there was no anomaly information in the random test set, we cannot report the detection accuracy here. The average anomaly score over all

13 Incremental Commute Time Using Random Walks 13 the test points of iect was 8.6% higher than the batch method. This shows the high accuracy of iect approximation even in a very large graph. 5.4 Real Datasets In this experiment, we report the results for online anomaly detection using real datasets in different application domains. They are applications in network intrusion detection, video surveillance and bridge damage detection. Spambase dataset: The Spambase dataset provided by Machine Learning Repository [4] was investigated. There are 4,601 s in the data with 57 features each. The task is check whether a is spam or not. Since the dataset has duplicated data instances, and the numbers of spams and non-spams are not imbalanced, we removed duplicated data, kept the non-spams, and sampled 100 spams from the dataset. Finally we had 2631 data instances. Computer network anomaly detection: The dataset is from a wireless mesh network at the University of Sydney which was deployed by NICTA [20]. It used a traffic generator to simulate traffic on the network. Packets were aggregated into one-minute time bins and the data was collected in 24 hours. There were 391 origin-destination flows and 1270 time bins. Anomalies were introduced to the network including DOS attacks and ping floods. After removing duplications in the data, we had 1193 time-bin instances. Damage detection on bridge: The Sydney Harbour Bridge is one of major bridges in Australia, which was opened in As the bridge is aging, it is critical to ensure it stays structurally healthy. There are 800 jack arches on the underside of the deck of the bus lane (lane seven) needed to be monitored. Vibration data caused by passing vehicles were recorded by three-axis accelerometers installed under the deck of lane seven. For this case study, only six instrumented joints were considered (named 1 to 6). The data were obtained in the period from early August until late October in A known crack existed in joint 4 while the other joints were in good conditions. The feature extraction was used as described in [7]. A dataset was created to include vibration events from all healthy joints and 100 events from the damaged joint (totally 2523 events). Each dataset was divided into a training set and a test set with 100 data points except that in the video dataset, test set only contained 38 data objects. The anomaly threshold τ was set based on the training data, which was the weakest score of the anomalies in the training set. The results of using iect and batch methods are shown in Table 4. It shows that iect has a high detection accuracy and is much more efficient than the batch method. Also the commute time scores between iect and batch method were quite similar. 5.5 Impact of Parameters In this section, we investigate how the parameters k 1, k 2, and k RP affect the effectiveness of the proposed method. Parameters k 1 and k 2 only affect the accuracy of computing commute time in batch mode and were analyzed in [8]. Therefore, this section analyses impact of k RP to the incremental commute time.

14 14 Khoa and Chawla Table 4: The effectiveness of iect in real datasets. It shows that iect has a high detection accuracy and is much more efficient than the batch method. Dataset Precision Recall iect Batch (%) (%) Avg Score Time (s) Avg Score Time (s) Spambase Network Bridge We conducted an experiment with different k RP for the three real datasets mentioned in the previous section. The results in Figure 4 show that the method can achieve high accuracy with small k RP and is not sensitive to k RP Accuracy Accuracy Accuracy krp (a) Spambase krp (b) Network krp (c) Bridge Fig. 4: The method can achieve high accuracy with small k RP and is not sensitive to k RP. 5.6 Summary and discussion The experimental results show that iect can accurately approximate the commute time in constant time. It is much more efficient than the batch method using Algorithm 1. The results on real datasets collected from different domains and applications also have similar tendency showing the reliability and effectiveness of the proposed method. One weakness of iect is that it can only be used in online applications where the update of the graph is given by the addition of a new node, not by updating the edge weights. However, in the case of updating edge weights, the method by Ning et. al in [12] can be used. This method incrementally updates the eigenvalues and eigenvectors of the graph Laplacian matrix based on a change of an edge weight on the graph. Then we can use the new eigen pairs of the Laplacian to update the commute time. 6 Related work Khoa and Chawla [8] proposed a new method to find anomalies using commute time. They showed that unlike Euclidean distance, commute time between two nodes can capture both the distance between them and their densities so that it can capture both global and local anomalies using distance based methods such as methods in [2].

15 Incremental Commute Time Using Random Walks 15 Incremental learning using an update on eigen decomposition has been studied for a long time. Early work studied the rank one modification of the symmetric eigen decomposition [5, 6]. The authors reduced the original problem to the eigen decomposition of a diagonal matrix. Though they can have a good approximation of the new eigenpair, they are not suitable for online applications nowadays since they have at least O(n 2 ) computation for the update. More recent approach was based on the matrix perturbation theory [1]. It used the first order perturbation analysis of the rank-one update for a data covariance matrix to compute the new eigenpair. These algorithms have a linear time computation. The advantage of using the covariance matrix is if the perturbation involving an insertion of a new point, the size of the covariance matrix is unchanged. This approach cannot be applied directly to increasing matrix size due to an insertion of a new point. For example, in spectral clustering or commute time based anomaly detection, the size of the graph Laplacian matrix increases when a new point is added to the graph. Ning et. al [12] proposed an incremental approach for spectral clustering to monitor evolving blog communities. It incrementally updates the eigenvalues and eigenvectors of the graph Laplacian matrix based on a change of an edge weight on the graph using the first order error of the generalized eigen system. This algorithm is only suitable for cases of weight update, not for an addition of a new node. 7 Conclusion In this paper, we proposed a method to approximate commute time incrementally and used it to design an online anomaly detection application. The method incrementally estimates the commute time in constant time using properties of random walk and hitting time. The main idea is to expand the hitting time recursion until the random walk has moved a few steps away from the new node and then use the old values. The experimental results in synthetic and real datasets show the effectiveness of the proposed approach in terms of performance and accuracy. iect can incrementally estimate the commute time accurately, resulting in high accuracy in several datasets from different applications. It only took 8 milliseconds on average to process a new arriving node in a graph of more than 600,000 nodes and two millions edges. Moreover, the idea of this work can be extended in other applications which utilize the commute time. References 1. Agrawal, R.K., Karmeshu: Perturbation scheme for online learning of features: Incremental principal component analysis. Pattern Recogn. 41, (2008) 2. Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD 03: Proc. of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. pp ACM, New York, NY, USA (2003)

16 16 Khoa and Chawla 3. Fouss, F., Renders, J.M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transaction on Knowledge and Data Engineering 19(3), (2007) 4. Frank, A., Asuncion, A.: Uci machine learning repository (2010) 5. Golub, G.H.: Some modified matrix eigenvalue problems. SIAM Review 15(2), (1973) 6. Gu, M., Eisenstat, S.C.: A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. SIAM J. Matrix Anal. Appl. 15, (1994) 7. Khoa, N.L., Zhang, B., Wang, Y., Chen, F., Mustapha, S.: Robust dimensionality reduction and damage detection approaches in structural health monitoring. Structural Health Monitoring 13(4), (2014) 8. Khoa, N.L.D., Chawla, S.: Robust outlier detection using commute time and eigenspace embedding. In: PAKDD 10: Proceedings of the The 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp Springer, Berlin/Heidelberg (2010) 9. Koutis, I., Miller, G.L., Tolliver, D.: Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing. In: Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I. pp ISVC 09, Springer-Verlag, Berlin, Heidelberg (2009) 10. Lovász, L.: Random walks on graphs: a survey. Combinatorics, Paul Erdös is Eighty 2, 1 46 (1993) 11. von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), (2007) 12. Ning, H., Xu, W., Chi, Y., Gong, Y., Huang, T.: Incremental spectral clustering with application to monitoring of evolving blog communities. In: In SIAM Int. Conf. on Data Mining (2007) 13. Purnamrita Sarkar, A.W.M.: A tractable approach to finding closest truncatedcommute-time neighbors in large graphs. In: The 23rd Conference on Uncertainty in Artificial Intelligence(UAI) (2007) 14. Qiu, H., Hancock, E.: Clustering and embedding using commute times. IEEE TPAMI 29(11), (2007) 15. Saerens, M., Fouss, F., Yen, L., Dupont, P.: The principal components analysis of a graph, and its relationships to spectral clustering. In: Proc. of the 15th European Conference on Machine Learning (ECML 2004). pp Springer-Verlag (2004) 16. Sarkar, P., Moore, A.W., Prakash, A.: Fast incremental proximity search in large graphs. In: Proceedings of the 25th international conference on Machine learning. pp ICML 08, ACM, New York, NY, USA (2008) 17. Spielman, D.A., Srivastava, N.: Graph sparsification by effective resistances. In: Proceedings of the 40th annual ACM symposium on Theory of computing. pp STOC 08, ACM, New York, NY, USA (2008) 18. Spielman, D.A., Teng, S.H.: Nearly-linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. CoRR abs/cs/ (2006) 19. Venkatasubramanian, S., Wang, Q.: The johnson-lindenstrauss transform: An empirical study. In: Mller-Hannemann, M., Werneck, R.F.F. (eds.) ALENEX. pp SIAM (2011) 20. Zaidi, Z.R., Hakami, S., Landfeldt, B., Moors, T.: Real-time detection of traffic anomalies in wireless mesh networks. Wireless Networks (2009)

LARGE SCALE ANOMALY DETECTION AND CLUSTERING USING RANDOM WALKS

LARGE SCALE ANOMALY DETECTION AND CLUSTERING USING RANDOM WALKS LARGE SCALE ANOMALY DETECTION AND CLUSTERING USING RANDOM WALKS A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in the School of Information Technologies at

More information

Spectral Generative Models for Graphs

Spectral Generative Models for Graphs Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known

More information

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

Anomaly Detection via Over-sampling Principal Component Analysis

Anomaly Detection via Over-sampling Principal Component Analysis Anomaly Detection via Over-sampling Principal Component Analysis Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee Abstract Outlier detection is an important issue in data mining and has been studied in different

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

Anomaly Detection via Online Oversampling Principal Component Analysis

Anomaly Detection via Online Oversampling Principal Component Analysis Anomaly Detection via Online Oversampling Principal Component Analysis R.Sundara Nagaraj 1, C.Anitha 2 and Mrs.K.K.Kavitha 3 1 PG Scholar (M.Phil-CS), Selvamm Art Science College (Autonomous), Namakkal,

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

An Empirical Comparison of Graph Laplacian Solvers

An Empirical Comparison of Graph Laplacian Solvers An Empirical Comparison of Graph Laplacian Solvers Kevin Deweese 1 Erik Boman 2 John Gilbert 1 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms Department

More information

SPECTRAL ANOMALY DETECTION USING GRAPH-BASED FILTERING FOR WIRELESS SENSOR NETWORKS. Hilmi E. Egilmez and Antonio Ortega

SPECTRAL ANOMALY DETECTION USING GRAPH-BASED FILTERING FOR WIRELESS SENSOR NETWORKS. Hilmi E. Egilmez and Antonio Ortega SPECTRAL ANOMALY DETECTION USING GRAPH-BASED FILTERING FOR WIRELESS SENSOR NETWORKS Hilmi E. Egilmez and Antonio Ortega Signal and Image Processing Institute, University of Southern California hegilmez@usc.edu,

More information

U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21

U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21 U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Scribe: Anupam Last revised Lecture 21 1 Laplacian systems in nearly linear time Building upon the ideas introduced in the

More information

Streaming multiscale anomaly detection

Streaming multiscale anomaly detection Streaming multiscale anomaly detection DATA-ENS Paris and ThalesAlenia Space B Ravi Kiran, Université Lille 3, CRISTaL Joint work with Mathieu Andreux beedotkiran@gmail.com June 20, 2017 (CRISTaL) Streaming

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

Spectral Methods for Subgraph Detection

Spectral Methods for Subgraph Detection Spectral Methods for Subgraph Detection Nadya T. Bliss & Benjamin A. Miller Embedded and High Performance Computing Patrick J. Wolfe Statistics and Information Laboratory Harvard University 12 July 2010

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber. CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

A New Space for Comparing Graphs

A New Space for Comparing Graphs A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main

More information

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course Spectral Algorithms I Slides based on Spectral Mesh Processing Siggraph 2010 course Why Spectral? A different way to look at functions on a domain Why Spectral? Better representations lead to simpler solutions

More information

MATH 829: Introduction to Data Mining and Analysis Clustering II

MATH 829: Introduction to Data Mining and Analysis Clustering II his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments

More information

Spectral Analysis of k-balanced Signed Graphs

Spectral Analysis of k-balanced Signed Graphs Spectral Analysis of k-balanced Signed Graphs Leting Wu 1, Xiaowei Ying 1, Xintao Wu 1, Aidong Lu 1 and Zhi-Hua Zhou 2 1 University of North Carolina at Charlotte, USA, {lwu8,xying, xwu,alu1}@uncc.edu

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Estimating Current-Flow Closeness Centrality with a Multigrid Laplacian Solver

Estimating Current-Flow Closeness Centrality with a Multigrid Laplacian Solver Estimating Current-Flow Closeness Centrality with a Multigrid Laplacian Solver E. Bergamini, M. Wegner, D. Lukarski, H. Meyerhenke October 12, 2016 SIAM WORKSHOP ON COMBINATORIAL SCIENTIFIC COMPUTING (CSC16)

More information

Online Estimation of Discrete Densities using Classifier Chains

Online Estimation of Discrete Densities using Classifier Chains Online Estimation of Discrete Densities using Classifier Chains Michael Geilke 1 and Eibe Frank 2 and Stefan Kramer 1 1 Johannes Gutenberg-Universtität Mainz, Germany {geilke,kramer}@informatik.uni-mainz.de

More information

Discovering Contexts and Contextual Outliers Using Random Walks in Graphs

Discovering Contexts and Contextual Outliers Using Random Walks in Graphs Discovering Contexts and Contextual Outliers Using Random Walks in Graphs Xiang Wang Ian Davidson Abstract Introduction. Motivation The identifying of contextual outliers allows the discovery of anomalous

More information

ORIE 6334 Spectral Graph Theory October 13, Lecture 15

ORIE 6334 Spectral Graph Theory October 13, Lecture 15 ORIE 6334 Spectral Graph heory October 3, 206 Lecture 5 Lecturer: David P. Williamson Scribe: Shijin Rajakrishnan Iterative Methods We have seen in the previous lectures that given an electrical network,

More information

Using HDDT to avoid instances propagation in unbalanced and evolving data streams

Using HDDT to avoid instances propagation in unbalanced and evolving data streams Using HDDT to avoid instances propagation in unbalanced and evolving data streams IJCNN 2014 Andrea Dal Pozzolo, Reid Johnson, Olivier Caelen, Serge Waterschoot, Nitesh V Chawla and Gianluca Bontempi 07/07/2014

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Subgraph Detection Using Eigenvector L1 Norms

Subgraph Detection Using Eigenvector L1 Norms Subgraph Detection Using Eigenvector L1 Norms Benjamin A. Miller Lincoln Laboratory Massachusetts Institute of Technology Lexington, MA 02420 bamiller@ll.mit.edu Nadya T. Bliss Lincoln Laboratory Massachusetts

More information

Information Propagation Analysis of Social Network Using the Universality of Random Matrix

Information Propagation Analysis of Social Network Using the Universality of Random Matrix Information Propagation Analysis of Social Network Using the Universality of Random Matrix Yusuke Sakumoto, Tsukasa Kameyama, Chisa Takano and Masaki Aida Tokyo Metropolitan University, 6-6 Asahigaoka,

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Online Appearance Model Learning for Video-Based Face Recognition

Online Appearance Model Learning for Video-Based Face Recognition Online Appearance Model Learning for Video-Based Face Recognition Liang Liu 1, Yunhong Wang 2,TieniuTan 1 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences,

More information

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms Graph Matching & Information Geometry Towards a Quantum Approach David Emms Overview Graph matching problem. Quantum algorithms. Classical approaches. How these could help towards a Quantum approach. Graphs

More information

The Forward-Backward Embedding of Directed Graphs

The Forward-Backward Embedding of Directed Graphs The Forward-Backward Embedding of Directed Graphs Anonymous authors Paper under double-blind review Abstract We introduce a novel embedding of directed graphs derived from the singular value decomposition

More information

Spectral Clustering on Handwritten Digits Database

Spectral Clustering on Handwritten Digits Database University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:

More information

Graph Sparsification I : Effective Resistance Sampling

Graph Sparsification I : Effective Resistance Sampling Graph Sparsification I : Effective Resistance Sampling Nikhil Srivastava Microsoft Research India Simons Institute, August 26 2014 Graphs G G=(V,E,w) undirected V = n w: E R + Sparsification Approximate

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Francois Fouss, Alain Pirotte, Jean-Michel Renders & Marco Saerens. January 31, 2006

Francois Fouss, Alain Pirotte, Jean-Michel Renders & Marco Saerens. January 31, 2006 A novel way of computing dissimilarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes Francois Fouss, Alain Pirotte, Jean-Michel Renders

More information

Unsupervised Clustering of Human Pose Using Spectral Embedding

Unsupervised Clustering of Human Pose Using Spectral Embedding Unsupervised Clustering of Human Pose Using Spectral Embedding Muhammad Haseeb and Edwin R Hancock Department of Computer Science, The University of York, UK Abstract In this paper we use the spectra of

More information

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

Cholesky Decomposition Rectification for Non-negative Matrix Factorization Cholesky Decomposition Rectification for Non-negative Matrix Factorization Tetsuya Yoshida Graduate School of Information Science and Technology, Hokkaido University N-14 W-9, Sapporo 060-0814, Japan yoshida@meme.hokudai.ac.jp

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Shang-Hua Teng Computer Science, Viterbi School of Engineering USC Massive Data and Massive Graphs 500 billions web

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Modeling of Growing Networks with Directional Attachment and Communities

Modeling of Growing Networks with Directional Attachment and Communities Modeling of Growing Networks with Directional Attachment and Communities Masahiro KIMURA, Kazumi SAITO, Naonori UEDA NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Kyoto 619-0237, Japan

More information

Preserving Privacy in Data Mining using Data Distortion Approach

Preserving Privacy in Data Mining using Data Distortion Approach Preserving Privacy in Data Mining using Data Distortion Approach Mrs. Prachi Karandikar #, Prof. Sachin Deshpande * # M.E. Comp,VIT, Wadala, University of Mumbai * VIT Wadala,University of Mumbai 1. prachiv21@yahoo.co.in

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Consensus Problems on Small World Graphs: A Structural Study

Consensus Problems on Small World Graphs: A Structural Study Consensus Problems on Small World Graphs: A Structural Study Pedram Hovareshti and John S. Baras 1 Department of Electrical and Computer Engineering and the Institute for Systems Research, University of

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds Complex Networks CSYS/MATH 303, Spring, 2011 Prof. Peter Dodds Department of Mathematics & Statistics Center for Complex Systems Vermont Advanced Computing Center University of Vermont Licensed under the

More information

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu

More information

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Radu Balan February 5, 2018 Datasets diversity: Social Networks: Set of individuals ( agents, actors ) interacting with each other (e.g.,

More information

Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams

Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams Khantil Patel, Orland Hoeber, and Howard J. Hamilton Department of Computer Science University of Regina, Canada patel26k@uregina.ca,

More information

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications Random Walks on Graphs Outline One Concrete Example of a random walk Motivation applications shuffling cards universal traverse sequence self stabilizing token management scheme random sampling enumeration

More information

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation Zheng-jian Bai Abstract In this paper, we first consider the inverse

More information

Deterministic Decentralized Search in Random Graphs

Deterministic Decentralized Search in Random Graphs Deterministic Decentralized Search in Random Graphs Esteban Arcaute 1,, Ning Chen 2,, Ravi Kumar 3, David Liben-Nowell 4,, Mohammad Mahdian 3, Hamid Nazerzadeh 1,, and Ying Xu 1, 1 Stanford University.

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

On the Complexity of the Minimum Independent Set Partition Problem

On the Complexity of the Minimum Independent Set Partition Problem On the Complexity of the Minimum Independent Set Partition Problem T-H. Hubert Chan 1, Charalampos Papamanthou 2, and Zhichao Zhao 1 1 Department of Computer Science the University of Hong Kong {hubert,zczhao}@cs.hku.hk

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European

More information

Approximate Spectral Clustering via Randomized Sketching

Approximate Spectral Clustering via Randomized Sketching Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

On the Maximal Error of Spectral Approximation of Graph Bisection

On the Maximal Error of Spectral Approximation of Graph Bisection To appear in Linear and Multilinear Algebra Vol. 00, No. 00, Month 20XX, 1 9 On the Maximal Error of Spectral Approximation of Graph Bisection John C. Urschel a,b,c, and Ludmil T. Zikatanov c,d, a Baltimore

More information

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks Anuva Kulkarni Carnegie Mellon University Filipe Condessa Carnegie Mellon, IST-University of Lisbon Jelena Kovacevic Carnegie

More information

The weighted spectral distribution; A graph metric with applications. Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge.

The weighted spectral distribution; A graph metric with applications. Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge. The weighted spectral distribution; A graph metric with applications. Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge. A graph metric: motivation. Graph theory statistical modelling Data

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Sparsification by Effective Resistance Sampling

Sparsification by Effective Resistance Sampling Spectral raph Theory Lecture 17 Sparsification by Effective Resistance Sampling Daniel A. Spielman November 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened

More information

Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

Random Matrices: Invertibility, Structure, and Applications

Random Matrices: Invertibility, Structure, and Applications Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Single-tree GMM training

Single-tree GMM training Single-tree GMM training Ryan R. Curtin May 27, 2015 1 Introduction In this short document, we derive a tree-independent single-tree algorithm for Gaussian mixture model training, based on a technique

More information

Diffusion and random walks on graphs

Diffusion and random walks on graphs Diffusion and random walks on graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural

More information

Limits of Spectral Clustering

Limits of Spectral Clustering Limits of Spectral Clustering Ulrike von Luxburg and Olivier Bousquet Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tübingen, Germany {ulrike.luxburg,olivier.bousquet}@tuebingen.mpg.de

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering. Guokun Lai 2016/10 Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph

More information

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an

More information

Linear Spectral Hashing

Linear Spectral Hashing Linear Spectral Hashing Zalán Bodó and Lehel Csató Babeş Bolyai University - Faculty of Mathematics and Computer Science Kogălniceanu 1., 484 Cluj-Napoca - Romania Abstract. assigns binary hash keys to

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

Robust Motion Segmentation by Spectral Clustering

Robust Motion Segmentation by Spectral Clustering Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication

More information

Application of Clustering to Earth Science Data: Progress and Challenges

Application of Clustering to Earth Science Data: Progress and Challenges Application of Clustering to Earth Science Data: Progress and Challenges Michael Steinbach Shyam Boriah Vipin Kumar University of Minnesota Pang-Ning Tan Michigan State University Christopher Potter NASA

More information

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Tomáš Kocák SequeL team INRIA Lille France Michal Valko SequeL team INRIA Lille France Rémi Munos SequeL team, INRIA

More information

Lecture: Modeling graphs with electrical networks

Lecture: Modeling graphs with electrical networks Stat260/CS294: Spectral Graph Methods Lecture 16-03/17/2015 Lecture: Modeling graphs with electrical networks Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.

More information

Robust Laplacian Eigenmaps Using Global Information

Robust Laplacian Eigenmaps Using Global Information Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information