Incremental Commute Time Using Random Walks and Online Anomaly Detection
|
|
- Jonas Wood
- 5 years ago
- Views:
Transcription
1 Incremental Commute Time Using Random Walks and Online Anomaly Detection Nguyen Lu Dang Khoa 1 and Sanjay Chawla 2,3 1 Data61, CSIRO, Australia khoa.nguyen@data61.csiro.au 2 Qatar Computing Research Institute, HBKU 3 University of Sydney, Australia sanjay.chawla@sydney.edu.au Abstract. Commute time is a random walk based metric on graphs and has found widespread successful applications in many application domains. However, the computation the commute time is expensive, involving the eigen decomposition of the graph Laplacian matrix. There has been effort to approximate the commute time in offline mode. Our interest is inspired by the use of commute time in online mode. We propose an accurate and efficient approximation for computing the commute time in an incremental fashion in order to facilitate real-time applications. An online anomaly detection technique is designed where the commute time of each new arriving data point to any point in the current graph can be estimated in constant time ensuring a real-time response. The proposed approach shows its high accuracy and efficiency in many synthetic and real datasets and takes only 8 milliseconds on average to detect anomalies online on the DBLP graph which has more than 600,000 nodes and 2 millions edges. Keywords: Commute time, random walk, incremental learning, online anomaly detection 1 Introduction Commute time is a well-known measure derived from random walks on graphs [10]. The commute time between two nodes i and j in a graph is the expected number of steps that a random walk, starting from i will take to visit j and then come back to i for the first time. Commute time has been used as a robust metric for different learning tasks such as clustering [14] and anomaly detection [8]. It has also found widespread applications in personalized search [16], collaborative filtering [3] and image segmentation [14]. The fact that the commute time is averaged over all paths (and not just the shortest path) makes it more robust to data perturbations. More advanced measures generally require more expensive computation. Estimating commute time involves the eigen decomposition of the graph Laplacian matrix and resulting in an O(n 3 ) time complexity which is impractical for
2 2 Khoa and Chawla large graphs. Saerens, Pirotte and Fouss [15] used subspace approximation to approximate the commute time. Sarkar and Moore [13] introduced a notion of truncated commute time and a pruning algorithm to find nearest neighbors in the truncated commute time. Recently, Spielman and Srivastava [17] proposed an approximation algorithm to create a structure in nearly linear time so that the pairwise commute time can be approximated in O(log n) time. However, all the above-mentioned approximation techniques all work in a batch fashion and therefore have a high computation cost for online applications. We are interested in the following scenarios: a dataset or a graph D is given from an underlying domain of interest such as data from a network traffic log or a social network graph. A new data point p arrives and we want to determine if p is an anomaly with respect to D in commute time. A data point is an anomaly if it is far away from its nearest neighbors in commute time measure (as described in [8]). This particular application requires the computation of commute time in an online fashion. In this paper, we propose a method called iect to incrementally estimate the commute time and use it to design an online anomaly detection application. The method makes use of the recursive definition of commute time in terms of random walk measures. The commute time from a new data point to any data point in the existing data D is computed based on the current commute times among points in D. The method is novel and reveals insights about commute time which are independent of the applications. The contributions of this paper are as follows: We use characteristics of random walk measures to propose a method to estimate the commute time incrementally in constant time. Then we design an online anomaly detection technique using the incremental commute time. To the best of our knowledge, this is the first method to estimate the commute time in an online fashion. The proposed technique is verified by experiments in different applications using several synthetic and real datasets. The experiments show the effectiveness of the proposed methods in terms of accuracy and performance. The methods can be applied directly to graph data and can be used in any application that utilizes the commute time (e.g. classification and graph ranking using commute time). The remainder of the paper is organized as follows. Section 2 reviews notations and concepts related to random walks and commute time and a method to approximate the commute time offline in large graphs. Section 3 presents a simple motivation example to tie up all the definitions and ideas, and proposes a method to incrementally estimate the commute time. In Section 4, we propose an online anomaly detection algorithm which uses the incremental commute time. We evaluate our approaches using experiments on synthetic and real datasets in Section 5. Sections 6 and 7 cover the related work and a summary of our work.
3 Incremental Commute Time Using Random Walks 3 2 Background 2.1 Random Walks on Graphs and Commute Time We provide a self-contained introduction to random walks with an emphasis on commute time. Assume we are given a connected undirected and weighted graph G = (V, E, W ). Definition 1. Let i be a node in G and N(i) be its neighbors. The degree d i of a node i is j N(i) w ij. The volume V G of the graph is defined as i V d i. Definition 2. The transition matrix M = (p ij ) i,j V of a random walk on G is given by { wij p ij = d i, if (i, j) E 0, otherwise Definition 3. The Hitting Time h ij is the expected number of steps that a random walk starting at i will take before reaching j for the first time. Definition 4. The Hitting Time can be defined in terms of the recursion { 1 + l N(i) h ij = p ilh if i j 0 otherwise Definition 5. The Commute Time c ij between two nodes i and j is given by c ij = h ij + h ji. Fact 1 The commute time can be expressed in terms of the Laplacian of G. c ij = V G (l + ii + l+ jj 2l+ ij ) = V G(e i e j ) T L + (e i e j ) (1) where l + ij is the (i, j) element of L+ (the pseudo-inverse of the Laplacian L) and e i is the V dimensional column vector with 1 at location i and zero elsewhere [3]. L + can be computed from the eigensystem of L: L + = V i=2 1 λ i v i v T i. 2.2 Approximation of Commute Time Embedding (Batch Mode) Computing commute time involves the eigen decomposition of the graph Laplacian matrix which is impractical for large graphs. Recently, Spielman and Srivastava [17] proposed an approximation algorithm utilizing random projection and a SDD solver to create a structure in nearly linear time so that the pairwise commute time can be approximated in k RP = O(log n) time (k RP is the reduced dimension in random projection). The fast SDD solver [18] for linear systems is a new class of near-linear time methods for solving a system of equations Ax = b when A is a symmetric diagonally dominant (SDD) matrix. The idea is based on the fact that θ = V G L + B T W 1/2 is a commute time embedding where the commute time c ij is a squared Euclidean distance between points i and j in θ. Here m be the number of edges in G, B is a signed edgevertex incidence matrix and W is a diagonal matrix whose entries are the edge weights. For the details of the embedding creation, refer to [17].
4 4 Khoa and Chawla 3 Incremental Commute Time 3.1 Problem and Scope Problem: Given a dataset or a graph D from an underlying domain of interest. When a new data instance p comes in, we want to compute the commute time from p to any data instance in D. In an Euclidean space, an insertion of a new point does not change the features of existing points. However, an insertion of a new node in an original feature space or a graph will change the features of existing points in the commute time embedding space, which is spanned by eigenvectors of the graph Laplacian matrix. Updating an eigensystem of a graph Laplacian is costly and not suitable for online applications. In this work, we use the characteristics of random walk measures to estimate the commute time incrementally in constant time and use it to design online applications. There are some notes regarding the scope of this work. Firstly, the proposed method is only suitable for applications which do not need to update the training model overtime (i.e. a representative training data are available). That means we treat the new data one by one, estimate its corresponding commute time and leave the trained model intact. Secondly, in case of graph data, we only deal with the case of node insertion, not node deletion or weight update. 3.2 Motivation Examples Consider a graph G shown in Figure 1a where all the edge weights equal 1. The sum of the degree of nodes, V G = 8. We will calculate the commute time c 12 in two different ways: (a) 4-node graph (b) Adding node 5 Fig. 1: c 12 increases after an addition of node 5 even though the shortest path distance remains unchanged. 1. Using random walk approach: note that the expected number of steps for a random walk starting at node 1 and returning back to it is V G d1 = 8 1 = 8 [10]. But the walk from node 1 can only go to node 2 and then return from node 2 to 1. Thus c 12 = Using algebraic approach: the Laplacian matrix is L =
5 Incremental Commute Time Using Random Walks 5 and the pseudo-inverse is L + = Since c 12 = V G (e 1 e 2 ) T L + (e 1 e 2 ) and (e 1 e 2 ) T L + (e 1 e 2 ) = = 1, T c 12 = V G 1 = 8. Suppose we add a new node (labeled 5) to node 4 with a unit weight as in Figure 1b. Then c new 12 = VG new/d 1 = 10/1 = 10. The example in Figure 1b shows that by adding an edge, i.e. making the cluster which contains node 2 denser, c 12 increases. This shows that commute time between two nodes captures not only the distance between them (as measured by the edge weights) but also the data densities. For the proof of this claim, see [8]. This property of commute time has been used to simultaneously discover global and local anomalies in data - an important problem in the anomaly detection literature. In the above example, we exploited the specific topology (degree one node) of the graph to calculate the commute time efficiently. This can only work for very specific instances. The general, more widely used but slower approach for computing the commute time is to use the Laplacian formula as in Equation 1. One key contribution of this paper is that for an incremental computation of commute time we can use insights from this example to efficiently approximate the commute time using random walk in much more general situations. 3.3 Incremental Estimation of Commute Time In this section, we derive a new method for computing the commute time in an incremental fashion. This method uses the definition of commute time based on the hitting time. The basic intuition is to expand the hitting time recursion until the random walk has moved a few steps away from the new node and then use the old values. In Section 5 we will show that this method results in remarkable agreement between the batch and online modes. We deal with two cases shown in Figure Rank one perturbation corresponds to the situation when a new node connects with one other node in the existing graph. 2. Rank k perturbation deals with the situation when the new node has k neighbors in the existing graph.
6 6 Khoa and Chawla (a) Rank 1 (b) Rank k Fig. 2: Rank 1 and rank k perturbation when a new data point arrives. Rank one perturbation Proposition 1. Let i be a new node connected by one edge to an existing node l in the graph G. Let w il be the weight of the new edge. Let j be an arbitrary node in the graph G. Then c ij = c old + V G w il + O( 1 k ) (2) where old represents the commute time in graph G (k nearest neighbor graph) before adding i. Proof. (Sketch) Since the random walk needs to pass l before reaching j, the commute distance from i to j is: It is known that: c ij = c il + c. (3) c il = (V G + 2w il ) w il (4) where V G is volume of graph G [8]. We also know c = h jl + h and h jl = h old jl. The only unknown factor is h. By definition: h = 1 + p lq h qj = 1 + p lq h qj + p li h ij. q N(l) q N(l),q i Since commute time is robust against small changes or perturbation in data, we have h qj h old qj. Moreover, p lq = (1 p li )p old lq, and h ij = 1 + h. Therefore, h 1 + (1 p li )p old lq h old qj + p li (1 + h ) q N(l),q i = 1 + (1 p li ) q N(l),q i p old lq h old qj + p li (1 + h ) = 1 + (1 p li )(h old 1) + p li (1 + h ). After simplification, h = h old + 2p li 1 p li. Then c h old jl + h old + 2p li 1 p li. Since there is only one edge connecting from i to G, i is likely an isolated point and thus p li = O( 1 k ) (G is the k nearest neighbor graph). Then c = h old jl + h old + O( 1 k ) = cold + O( 1 ). (5) k
7 Incremental Commute Time Using Random Walks 7 As a result from Equations 3, 4, and 5: c ij = (V G + 2w il ) w il + c old + O( 1 k ) = cold + V G w il + O( 1 k ) Rank k perturbation The rank k perturbation analysis is more involved but the final formulation is an extension of the rank one case. Proposition 2. Denote l G be one of k neighbors of i, and j be a node in G. The approximate commute time between nodes i and j is: c ij l N(i) p il c old + V G d i + O( 1 k ) (6) For the proof, see Appendix in the supplement document. When k = 1 (rank one case), the Equation 6 becomes Equation 2. 4 Online Applications Using Incremental Commute Time We return to our original motivation for computing incremental commute time. We are given a dataset D which is a representative of the underlying domain of interest. We need to find nearest neighbors of a new data point p in commute time metric incrementally. We want to check if p is an anomaly in D. We train the dataset D using Algorithm 1. First, a mutual k 1 -nearest neighbor graph is constructed from the dataset. This graph connects nodes u and v if u belongs to k 1 -nearest neighbors of v and v belongs to k 1 -nearest neighbors of u [11]. Then the approximate commute time embedding θ is computed as in Section 2.2. Finally, a distance-based anomaly detection with a pruning rule proposed by Bay and Schwabacher [2] is used in θ to find the top N anomalies. That means the distance-based method uses commute time, instead of Euclidean distance. It has been shown that a distance-based approach using commute time can be used to simultaneously identify global, local and even group anomalies in data [8]. The anomaly score used is the average commute time of a data instance to its k 2 nearest neighbors. Pruning Rule [2]: A data point is not an anomaly if its score (e.g. the average distance to its k nearest neighbors) is less than an anomaly threshold. The threshold can be fixed or be adjusted as the score of the weakest anomaly found so far. Using the pruning rule, many non-anomalies can be pruned without carrying out a full nearest neighbors search. After training, the corresponding graph G, the commute time embedding θ, and the anomaly threshold τ are obtained (τ is the score of the weakest anomaly found among top N anomalies). We propose a method shown in Algorithm 2 (denote as iect) to detect anomalies online given the trained model. When a new data point p arrives, it is connected to graph G created in the training phase so that the property of the mutual nearest neighbor graph is held. The commute times are incrementally updated to estimate the anomaly score
8 8 Khoa and Chawla Algorithm 1 Approximate Commute Time Distance Based Anomaly Detection (for training). Input: Data matrix X, the numbers of nearest neighbors k 1 (for building the k- nearest neighbor graph) and k 2 (for estimating the anomaly score), the number of random vectors k RP, the numbers of anomalies to return N Output: Top N anomalies, anomaly threshold τ 1: Construct a mutual k-nearest neighbor graph G from the dataset (using k 1) 2: Compute the approximate commute time embedding θ from G 3: Find top N anomalies using a distance-based technique with pruning rule described in [2] on θ (using k 2) 4: Return top N anomalies and the anomaly threshold τ Algorithm 2 Online Anomaly Detection using the incremental Estimation of Commute Time (iect) Input: Graph G, the approximate commute time embedding θ and the anomaly threshold τ computed in the training phase, and a new arriving data point p Output: Determine if p is an anomaly or not 1: Add p to G satisfying the property of the mutual nearest neighbor graph 2: Determine if p is an anomaly or not by estimating its anomaly score incrementally using the method described in Section 3.3. Use pruning rule with threshold τ to reduce the computation 3: Return whether p is an anomaly or not of p using the approach in Section 3.3. The embedding θ is used to compute the commute time c old. The pruning is used as follows: p is not anomaly if its average distance to k nearest neighbors is smaller than the anomaly threshold τ. Generally commute time is robust against small changes or perturbation in data. Therefore, only the anomaly score of a new data point needs to be estimated and be compared with the anomaly threshold computed in the training phase. This claim will be verified by experiments in Section Analysis The incremental estimation of commute time in Section 3.3 requires O(k RP ) for each query of c old in θ. So if there are k edges added to the graph due to the addition of a new node, it takes O(kk RP ) for each query of c ij. As explained earlier, we only need to compute the anomaly score of the new data point. Using pruning rule with the known anomaly threshold, it takes only O(k 2 ) nearest neighbor search to determine if the test point is an anomaly or not where k 2 is the number of nearest neighbors for estimating the anomaly score. For each commute time query, it takes O(kk RP ) as described above. Therefore, iect takes O(k 2 kk RP ) to determine if a new arriving point is an anomaly or not. [19] has suggested that k RP = 2 ln n/ which is just 442 for a dataset of
9 Incremental Commute Time Using Random Walks 9 a million data points. Therefore k RP n. Since k, k 2 n, O(k 2 kk RP ) = O(1) resulting in a near constant time complexity for iect. Note that this constant time complexity of iect does not depend on the complexity of O(k RP ) for each query of c old using the method in [17]. If we query c old using equation 1 with just O(k EV ) eigenvectors of Laplacian matrix L (as described in [8]), each query only takes O(k EV n) also resulting in a constant time complexity for iect. 5 Experiments and Results In this section, we determined and compared the effectiveness of online anomaly detection application using incremental commute time. The experiments were carried out on synthetic as well as real datasets. In all experiments, the numbers of nearest neighbors were k 1 = 10 (for building the nearest neighbor graph), k 2 = 20 (for estimating a nearest neighbor score or an anomaly score in anomaly detection applications), and the number of random vectors was k RP = 200 (for creating the commute time embedding) unless otherwise stated. We used Koutis s CMG solver [9] as an implementation of the SDD-Solver for creating the embedding. The solver is used for SDD matrices which is available online at The choice of parameters was determined from the experiments and it was also analyzed in Section 5.5. Source code and data can be assessed at 0B6LuuZJnvhFdTldkMmE1clk2T28/view?usp=sharing 5.1 Approach We split a dataset into two parts: a training set and a test set. We trained the training set to find top N anomalies and the threshold value τ using Algorithm 1. Then an anomaly score of each instance p in the test set was calculated based on its k 2 neighbors in the training set. If this score was greater than τ then the test instance was reported as an anomaly. During the time searching for the nearest neighbors of p, if its average distance to the nearest neighbors found so far is smaller than τ, we can stop the search as p is not anomaly (pruning rule). In practise, it is not trivial to know the amount of anomalies N in the training data so that we can find the top N and set the threshold for anomaly. We investigated a method to find the threshold as follows. In the training phase, we computed the anomaly scores of all the data points and we had the mean and standard deviation of all the scores. Anomalies were data points whose scores were greater than three times of standard deviation away from the mean score. N was the number of anomalies found. Baseline: in all experiments, the batch method (Algorithm 1) was used as the benchmark since there is no other method to estimate commute time incrementally. Note that for both the batch and incremental methods, we need to compute only the anomaly score of the new arriving data instance and pruning was also applied using τ. The difference is in the batch method, the new approximate commute time embedding was recomputed and the anomaly score was estimated using the new embedding space. The incremental method, on the
10 10 Khoa and Chawla other hand, estimated the score incrementally using the method described in Section Synthetic datasets We created six synthetic datasets with 1000, 10000, 20000, 30000, and data points. Each dataset contained several clusters generated from Normal distributions and 100 random points generated from uniform distribution which were likely anomalies. The number of clusters, the sizes, and the locations of the clusters were also chosen randomly. Each dataset was divided into a training set and a test set. There were 100 data points in every test set and half of them were random anomalies mentioned above. Experiments on Robustness: We first tested the robustness of commute time between nodes in an existing graph when a new node is introduced. As the commute time c ij is a measure of expected path distance, the hypothesis is that the addition of a new point will have minimal influence on c ij and thus the anomaly scores of data points in the existing set are relatively unchanged. Table 1 shows the average, standard deviation, minimum, and maximum of anomaly scores of points in graph G before and after a new data point was added to G. Graph G was created from the training set of a 1000 point dataset described above. The result was averaged over 100 test points in the test set. The result shows that the anomaly scores of data instances in G do not change much when a new point is added to G (the change of the average score was only about 0.7%). Table 1: Robustness of commute time. The anomaly scores of data instances in existing graph G are relatively unchanged when a new point is added to G. Average Std Min Max Without test point 15, , , With test point 15, , , In the following experiments, the change in eigensystem of the graph Laplacian L of the training data due to an addition of a new node was analyzed. Figures 3a shows average changes in the top 50 eigenvalues before and after an addition of each test point in the test set in the 1000 point dataset. The changes are small for most of them (most of them were less than 1% and all of them were less than 6%). Figures 3b shows dot products of eigenvectors with the second smallest eigenvalues (the smallest is zero) before and after an addition of each test point. The eigenvectors did not change much after we add a new node to the graph. As shown in Equation 1, since the change in eigensystem of the Laplacian is small, the commute times between existing training nodes do not change much. All these results show commute time is a robust measure: a small change or perturbation in the data will not result in large changes in commute times. Therefore, only the anomaly score of the new point needs to be estimated.
11 Incremental Commute Time Using Random Walks 11 Eigen value changes (before vs after) (%) Test index (a) Eigenvalue changes Dot products of second eigenvectors (before vs after) Test index (b) Eigenvector changes Fig. 3: Change in eigensystem when new nodes were added to the graph. Experiments on Effectiveness: We applied iect to all six datasets mentioned earlier. The effectiveness of iect and the commute time approximation were reported and discussed. Table 2 presents the results in accuracy and performance of iect in six synthetic datasets. Average score was the average anomaly score with pruning rule over 100 test points. The precision and recall were for the anomalous class. The time was the average time to process each of 100 test points. iect captured all the anomalies, had a few false alarms, and was much more efficient than the batch method. Note that the scores shown here were the anomaly scores with pruning rule and the scores for anomalies are always much higher than scores for normal points. Therefore the average scores shown in the table were dominated by the scores of anomalies. Table 2: Effectiveness of the incremental method. iect captured all the anomalies, had a few false alarms and was much more efficient than the batch method. Dataset iect Batch Size Precison (%) Recall (%) Avg score Time (s) Avg score Time (s) 1, , , , , , There is an interesting dynamic at play between the pruning rule and the number of anomalies in the data. The reason is there was a high proportion of anomalies in the test set (about 50%). We know that the pruning rule only works for non-anomalies and therefore, the time to process anomalies should be much longer than the others. Table 3 shows the details of time to process data
12 12 Khoa and Chawla points in the test set. For batch and iect methods, the average time to process only anomalies, only other data points (non-anomalies), and all data instances are listed in the table. There was not much difference in batch method between time to process anomalies and non-anomalies since for each new data point the time to create the new commute time embedding was much higher than that of the nearest neighbor search. On the other hand, this gap was very high for iect so that the times to process non-anomalies were much faster than those of anomalies. In practice, since most of the data points are not anomalies, iect is very efficient. Another cost we have not mentioned is the time to update the graph. That is the time to add a new data point to an existing graph satisfying the property of the mutual nearest neighbor graph. Since we stored the kd tree corresponding to the training data, the update cost was very low as shown in Table 3. Table 3: Performance of the incremental method. In iect, the times to process non-anomalies were much faster than those of anomalies. Dataset Graph update iect (s) Batch (s) Size Time (s) Anomaly Others All Anomaly Others All 1, , , , , , Graph Dataset In this section, we evaluated the iect method on a large DBLP co-authorship network to show the scalability of the method. In this graph, nodes are authors and edge weights are the number of collaborated papers between the authors. Since the graph is not fully connected, we extracted its biggest component. It has 612,949 nodes and 2,345,178 edges in a snapshot in December 5th, 2011 which is available at We randomly chose a test set of 50 nodes and removed them from the graph. We ensured that the graph remained connected. After training, each node was added back into the graph along with their associated edges. We trained the graph using Algorithm 1, stored the approximate embedding in order to query the c old in iect algorithm. The batch method use the approximate embedding created from a new graph after adding each test point. The result shows that it took seconds on average over 50 test data points to detect whether each test point was an anomaly or not. The batch method, which is the fastest approximation of commute time to date, required 1,454 seconds on average to process each test data point. This dramatically highlights the constant time complexity of iect algorithm and suggests that iect is highly suitable for the computation of commute time in an incremental fashion. Since there was no anomaly information in the random test set, we cannot report the detection accuracy here. The average anomaly score over all
13 Incremental Commute Time Using Random Walks 13 the test points of iect was 8.6% higher than the batch method. This shows the high accuracy of iect approximation even in a very large graph. 5.4 Real Datasets In this experiment, we report the results for online anomaly detection using real datasets in different application domains. They are applications in network intrusion detection, video surveillance and bridge damage detection. Spambase dataset: The Spambase dataset provided by Machine Learning Repository [4] was investigated. There are 4,601 s in the data with 57 features each. The task is check whether a is spam or not. Since the dataset has duplicated data instances, and the numbers of spams and non-spams are not imbalanced, we removed duplicated data, kept the non-spams, and sampled 100 spams from the dataset. Finally we had 2631 data instances. Computer network anomaly detection: The dataset is from a wireless mesh network at the University of Sydney which was deployed by NICTA [20]. It used a traffic generator to simulate traffic on the network. Packets were aggregated into one-minute time bins and the data was collected in 24 hours. There were 391 origin-destination flows and 1270 time bins. Anomalies were introduced to the network including DOS attacks and ping floods. After removing duplications in the data, we had 1193 time-bin instances. Damage detection on bridge: The Sydney Harbour Bridge is one of major bridges in Australia, which was opened in As the bridge is aging, it is critical to ensure it stays structurally healthy. There are 800 jack arches on the underside of the deck of the bus lane (lane seven) needed to be monitored. Vibration data caused by passing vehicles were recorded by three-axis accelerometers installed under the deck of lane seven. For this case study, only six instrumented joints were considered (named 1 to 6). The data were obtained in the period from early August until late October in A known crack existed in joint 4 while the other joints were in good conditions. The feature extraction was used as described in [7]. A dataset was created to include vibration events from all healthy joints and 100 events from the damaged joint (totally 2523 events). Each dataset was divided into a training set and a test set with 100 data points except that in the video dataset, test set only contained 38 data objects. The anomaly threshold τ was set based on the training data, which was the weakest score of the anomalies in the training set. The results of using iect and batch methods are shown in Table 4. It shows that iect has a high detection accuracy and is much more efficient than the batch method. Also the commute time scores between iect and batch method were quite similar. 5.5 Impact of Parameters In this section, we investigate how the parameters k 1, k 2, and k RP affect the effectiveness of the proposed method. Parameters k 1 and k 2 only affect the accuracy of computing commute time in batch mode and were analyzed in [8]. Therefore, this section analyses impact of k RP to the incremental commute time.
14 14 Khoa and Chawla Table 4: The effectiveness of iect in real datasets. It shows that iect has a high detection accuracy and is much more efficient than the batch method. Dataset Precision Recall iect Batch (%) (%) Avg Score Time (s) Avg Score Time (s) Spambase Network Bridge We conducted an experiment with different k RP for the three real datasets mentioned in the previous section. The results in Figure 4 show that the method can achieve high accuracy with small k RP and is not sensitive to k RP Accuracy Accuracy Accuracy krp (a) Spambase krp (b) Network krp (c) Bridge Fig. 4: The method can achieve high accuracy with small k RP and is not sensitive to k RP. 5.6 Summary and discussion The experimental results show that iect can accurately approximate the commute time in constant time. It is much more efficient than the batch method using Algorithm 1. The results on real datasets collected from different domains and applications also have similar tendency showing the reliability and effectiveness of the proposed method. One weakness of iect is that it can only be used in online applications where the update of the graph is given by the addition of a new node, not by updating the edge weights. However, in the case of updating edge weights, the method by Ning et. al in [12] can be used. This method incrementally updates the eigenvalues and eigenvectors of the graph Laplacian matrix based on a change of an edge weight on the graph. Then we can use the new eigen pairs of the Laplacian to update the commute time. 6 Related work Khoa and Chawla [8] proposed a new method to find anomalies using commute time. They showed that unlike Euclidean distance, commute time between two nodes can capture both the distance between them and their densities so that it can capture both global and local anomalies using distance based methods such as methods in [2].
15 Incremental Commute Time Using Random Walks 15 Incremental learning using an update on eigen decomposition has been studied for a long time. Early work studied the rank one modification of the symmetric eigen decomposition [5, 6]. The authors reduced the original problem to the eigen decomposition of a diagonal matrix. Though they can have a good approximation of the new eigenpair, they are not suitable for online applications nowadays since they have at least O(n 2 ) computation for the update. More recent approach was based on the matrix perturbation theory [1]. It used the first order perturbation analysis of the rank-one update for a data covariance matrix to compute the new eigenpair. These algorithms have a linear time computation. The advantage of using the covariance matrix is if the perturbation involving an insertion of a new point, the size of the covariance matrix is unchanged. This approach cannot be applied directly to increasing matrix size due to an insertion of a new point. For example, in spectral clustering or commute time based anomaly detection, the size of the graph Laplacian matrix increases when a new point is added to the graph. Ning et. al [12] proposed an incremental approach for spectral clustering to monitor evolving blog communities. It incrementally updates the eigenvalues and eigenvectors of the graph Laplacian matrix based on a change of an edge weight on the graph using the first order error of the generalized eigen system. This algorithm is only suitable for cases of weight update, not for an addition of a new node. 7 Conclusion In this paper, we proposed a method to approximate commute time incrementally and used it to design an online anomaly detection application. The method incrementally estimates the commute time in constant time using properties of random walk and hitting time. The main idea is to expand the hitting time recursion until the random walk has moved a few steps away from the new node and then use the old values. The experimental results in synthetic and real datasets show the effectiveness of the proposed approach in terms of performance and accuracy. iect can incrementally estimate the commute time accurately, resulting in high accuracy in several datasets from different applications. It only took 8 milliseconds on average to process a new arriving node in a graph of more than 600,000 nodes and two millions edges. Moreover, the idea of this work can be extended in other applications which utilize the commute time. References 1. Agrawal, R.K., Karmeshu: Perturbation scheme for online learning of features: Incremental principal component analysis. Pattern Recogn. 41, (2008) 2. Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD 03: Proc. of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. pp ACM, New York, NY, USA (2003)
16 16 Khoa and Chawla 3. Fouss, F., Renders, J.M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transaction on Knowledge and Data Engineering 19(3), (2007) 4. Frank, A., Asuncion, A.: Uci machine learning repository (2010) 5. Golub, G.H.: Some modified matrix eigenvalue problems. SIAM Review 15(2), (1973) 6. Gu, M., Eisenstat, S.C.: A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. SIAM J. Matrix Anal. Appl. 15, (1994) 7. Khoa, N.L., Zhang, B., Wang, Y., Chen, F., Mustapha, S.: Robust dimensionality reduction and damage detection approaches in structural health monitoring. Structural Health Monitoring 13(4), (2014) 8. Khoa, N.L.D., Chawla, S.: Robust outlier detection using commute time and eigenspace embedding. In: PAKDD 10: Proceedings of the The 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp Springer, Berlin/Heidelberg (2010) 9. Koutis, I., Miller, G.L., Tolliver, D.: Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing. In: Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I. pp ISVC 09, Springer-Verlag, Berlin, Heidelberg (2009) 10. Lovász, L.: Random walks on graphs: a survey. Combinatorics, Paul Erdös is Eighty 2, 1 46 (1993) 11. von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), (2007) 12. Ning, H., Xu, W., Chi, Y., Gong, Y., Huang, T.: Incremental spectral clustering with application to monitoring of evolving blog communities. In: In SIAM Int. Conf. on Data Mining (2007) 13. Purnamrita Sarkar, A.W.M.: A tractable approach to finding closest truncatedcommute-time neighbors in large graphs. In: The 23rd Conference on Uncertainty in Artificial Intelligence(UAI) (2007) 14. Qiu, H., Hancock, E.: Clustering and embedding using commute times. IEEE TPAMI 29(11), (2007) 15. Saerens, M., Fouss, F., Yen, L., Dupont, P.: The principal components analysis of a graph, and its relationships to spectral clustering. In: Proc. of the 15th European Conference on Machine Learning (ECML 2004). pp Springer-Verlag (2004) 16. Sarkar, P., Moore, A.W., Prakash, A.: Fast incremental proximity search in large graphs. In: Proceedings of the 25th international conference on Machine learning. pp ICML 08, ACM, New York, NY, USA (2008) 17. Spielman, D.A., Srivastava, N.: Graph sparsification by effective resistances. In: Proceedings of the 40th annual ACM symposium on Theory of computing. pp STOC 08, ACM, New York, NY, USA (2008) 18. Spielman, D.A., Teng, S.H.: Nearly-linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. CoRR abs/cs/ (2006) 19. Venkatasubramanian, S., Wang, Q.: The johnson-lindenstrauss transform: An empirical study. In: Mller-Hannemann, M., Werneck, R.F.F. (eds.) ALENEX. pp SIAM (2011) 20. Zaidi, Z.R., Hakami, S., Landfeldt, B., Moors, T.: Real-time detection of traffic anomalies in wireless mesh networks. Wireless Networks (2009)
LARGE SCALE ANOMALY DETECTION AND CLUSTERING USING RANDOM WALKS
LARGE SCALE ANOMALY DETECTION AND CLUSTERING USING RANDOM WALKS A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in the School of Information Technologies at
More informationSpectral Generative Models for Graphs
Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known
More informationData Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings
Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More informationAnomaly Detection via Over-sampling Principal Component Analysis
Anomaly Detection via Over-sampling Principal Component Analysis Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee Abstract Outlier detection is an important issue in data mining and has been studied in different
More informationA Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier
A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,
More informationLecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving
Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney
More informationAnomaly Detection via Online Oversampling Principal Component Analysis
Anomaly Detection via Online Oversampling Principal Component Analysis R.Sundara Nagaraj 1, C.Anitha 2 and Mrs.K.K.Kavitha 3 1 PG Scholar (M.Phil-CS), Selvamm Art Science College (Autonomous), Namakkal,
More informationA Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier
A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe
More informationAn Empirical Comparison of Graph Laplacian Solvers
An Empirical Comparison of Graph Laplacian Solvers Kevin Deweese 1 Erik Boman 2 John Gilbert 1 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms Department
More informationSPECTRAL ANOMALY DETECTION USING GRAPH-BASED FILTERING FOR WIRELESS SENSOR NETWORKS. Hilmi E. Egilmez and Antonio Ortega
SPECTRAL ANOMALY DETECTION USING GRAPH-BASED FILTERING FOR WIRELESS SENSOR NETWORKS Hilmi E. Egilmez and Antonio Ortega Signal and Image Processing Institute, University of Southern California hegilmez@usc.edu,
More informationU.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21
U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Scribe: Anupam Last revised Lecture 21 1 Laplacian systems in nearly linear time Building upon the ideas introduced in the
More informationStreaming multiscale anomaly detection
Streaming multiscale anomaly detection DATA-ENS Paris and ThalesAlenia Space B Ravi Kiran, Université Lille 3, CRISTaL Joint work with Mathieu Andreux beedotkiran@gmail.com June 20, 2017 (CRISTaL) Streaming
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationSpectral Clustering. Zitao Liu
Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of
More informationSpectral Methods for Subgraph Detection
Spectral Methods for Subgraph Detection Nadya T. Bliss & Benjamin A. Miller Embedded and High Performance Computing Patrick J. Wolfe Statistics and Information Laboratory Harvard University 12 July 2010
More informationMarkov Chains and Spectral Clustering
Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu
More informationRaRE: Social Rank Regulated Large-scale Network Embedding
RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat
More informationCS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.
CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationUnsupervised Anomaly Detection for High Dimensional Data
Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation
More informationA New Space for Comparing Graphs
A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main
More informationSpectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course
Spectral Algorithms I Slides based on Spectral Mesh Processing Siggraph 2010 course Why Spectral? A different way to look at functions on a domain Why Spectral? Better representations lead to simpler solutions
More informationMATH 829: Introduction to Data Mining and Analysis Clustering II
his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments
More informationSpectral Analysis of k-balanced Signed Graphs
Spectral Analysis of k-balanced Signed Graphs Leting Wu 1, Xiaowei Ying 1, Xintao Wu 1, Aidong Lu 1 and Zhi-Hua Zhou 2 1 University of North Carolina at Charlotte, USA, {lwu8,xying, xwu,alu1}@uncc.edu
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationMirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik
Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationEstimating Current-Flow Closeness Centrality with a Multigrid Laplacian Solver
Estimating Current-Flow Closeness Centrality with a Multigrid Laplacian Solver E. Bergamini, M. Wegner, D. Lukarski, H. Meyerhenke October 12, 2016 SIAM WORKSHOP ON COMBINATORIAL SCIENTIFIC COMPUTING (CSC16)
More informationOnline Estimation of Discrete Densities using Classifier Chains
Online Estimation of Discrete Densities using Classifier Chains Michael Geilke 1 and Eibe Frank 2 and Stefan Kramer 1 1 Johannes Gutenberg-Universtität Mainz, Germany {geilke,kramer}@informatik.uni-mainz.de
More informationDiscovering Contexts and Contextual Outliers Using Random Walks in Graphs
Discovering Contexts and Contextual Outliers Using Random Walks in Graphs Xiang Wang Ian Davidson Abstract Introduction. Motivation The identifying of contextual outliers allows the discovery of anomalous
More informationORIE 6334 Spectral Graph Theory October 13, Lecture 15
ORIE 6334 Spectral Graph heory October 3, 206 Lecture 5 Lecturer: David P. Williamson Scribe: Shijin Rajakrishnan Iterative Methods We have seen in the previous lectures that given an electrical network,
More informationUsing HDDT to avoid instances propagation in unbalanced and evolving data streams
Using HDDT to avoid instances propagation in unbalanced and evolving data streams IJCNN 2014 Andrea Dal Pozzolo, Reid Johnson, Olivier Caelen, Serge Waterschoot, Nitesh V Chawla and Gianluca Bontempi 07/07/2014
More informationSpectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity
Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationLecture: Local Spectral Methods (1 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationSubgraph Detection Using Eigenvector L1 Norms
Subgraph Detection Using Eigenvector L1 Norms Benjamin A. Miller Lincoln Laboratory Massachusetts Institute of Technology Lexington, MA 02420 bamiller@ll.mit.edu Nadya T. Bliss Lincoln Laboratory Massachusetts
More informationInformation Propagation Analysis of Social Network Using the Universality of Random Matrix
Information Propagation Analysis of Social Network Using the Universality of Random Matrix Yusuke Sakumoto, Tsukasa Kameyama, Chisa Takano and Masaki Aida Tokyo Metropolitan University, 6-6 Asahigaoka,
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationOnline Appearance Model Learning for Video-Based Face Recognition
Online Appearance Model Learning for Video-Based Face Recognition Liang Liu 1, Yunhong Wang 2,TieniuTan 1 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences,
More informationGraph Matching & Information. Geometry. Towards a Quantum Approach. David Emms
Graph Matching & Information Geometry Towards a Quantum Approach David Emms Overview Graph matching problem. Quantum algorithms. Classical approaches. How these could help towards a Quantum approach. Graphs
More informationThe Forward-Backward Embedding of Directed Graphs
The Forward-Backward Embedding of Directed Graphs Anonymous authors Paper under double-blind review Abstract We introduce a novel embedding of directed graphs derived from the singular value decomposition
More informationSpectral Clustering on Handwritten Digits Database
University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:
More informationGraph Sparsification I : Effective Resistance Sampling
Graph Sparsification I : Effective Resistance Sampling Nikhil Srivastava Microsoft Research India Simons Institute, August 26 2014 Graphs G G=(V,E,w) undirected V = n w: E R + Sparsification Approximate
More informationLecture 13: Spectral Graph Theory
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationFrancois Fouss, Alain Pirotte, Jean-Michel Renders & Marco Saerens. January 31, 2006
A novel way of computing dissimilarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes Francois Fouss, Alain Pirotte, Jean-Michel Renders
More informationUnsupervised Clustering of Human Pose Using Spectral Embedding
Unsupervised Clustering of Human Pose Using Spectral Embedding Muhammad Haseeb and Edwin R Hancock Department of Computer Science, The University of York, UK Abstract In this paper we use the spectra of
More informationCholesky Decomposition Rectification for Non-negative Matrix Factorization
Cholesky Decomposition Rectification for Non-negative Matrix Factorization Tetsuya Yoshida Graduate School of Information Science and Technology, Hokkaido University N-14 W-9, Sapporo 060-0814, Japan yoshida@meme.hokudai.ac.jp
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More informationAlgorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm
Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Shang-Hua Teng Computer Science, Viterbi School of Engineering USC Massive Data and Massive Graphs 500 billions web
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationModeling of Growing Networks with Directional Attachment and Communities
Modeling of Growing Networks with Directional Attachment and Communities Masahiro KIMURA, Kazumi SAITO, Naonori UEDA NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Kyoto 619-0237, Japan
More informationPreserving Privacy in Data Mining using Data Distortion Approach
Preserving Privacy in Data Mining using Data Distortion Approach Mrs. Prachi Karandikar #, Prof. Sachin Deshpande * # M.E. Comp,VIT, Wadala, University of Mumbai * VIT Wadala,University of Mumbai 1. prachiv21@yahoo.co.in
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationConsensus Problems on Small World Graphs: A Structural Study
Consensus Problems on Small World Graphs: A Structural Study Pedram Hovareshti and John S. Baras 1 Department of Electrical and Computer Engineering and the Institute for Systems Research, University of
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationGraph Partitioning Using Random Walks
Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationComplex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds
Complex Networks CSYS/MATH 303, Spring, 2011 Prof. Peter Dodds Department of Mathematics & Statistics Center for Complex Systems Vermont Advanced Computing Center University of Vermont Licensed under the
More informationNetworks as vectors of their motif frequencies and 2-norm distance as a measure of similarity
Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu
More informationLecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian
Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Radu Balan February 5, 2018 Datasets diversity: Social Networks: Set of individuals ( agents, actors ) interacting with each other (e.g.,
More informationReal-time Sentiment-Based Anomaly Detection in Twitter Data Streams
Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams Khantil Patel, Orland Hoeber, and Howard J. Hamilton Department of Computer Science University of Regina, Canada patel26k@uregina.ca,
More informationRandom Walks on Graphs. One Concrete Example of a random walk Motivation applications
Random Walks on Graphs Outline One Concrete Example of a random walk Motivation applications shuffling cards universal traverse sequence self stabilizing token management scheme random sampling enumeration
More informationThe Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation
The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation Zheng-jian Bai Abstract In this paper, we first consider the inverse
More informationDeterministic Decentralized Search in Random Graphs
Deterministic Decentralized Search in Random Graphs Esteban Arcaute 1,, Ning Chen 2,, Ravi Kumar 3, David Liben-Nowell 4,, Mohammad Mahdian 3, Hamid Nazerzadeh 1,, and Ying Xu 1, 1 Stanford University.
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationOn the Complexity of the Minimum Independent Set Partition Problem
On the Complexity of the Minimum Independent Set Partition Problem T-H. Hubert Chan 1, Charalampos Papamanthou 2, and Zhichao Zhao 1 1 Department of Computer Science the University of Hong Kong {hubert,zczhao}@cs.hku.hk
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European
More informationApproximate Spectral Clustering via Randomized Sketching
Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More informationOn the Maximal Error of Spectral Approximation of Graph Bisection
To appear in Linear and Multilinear Algebra Vol. 00, No. 00, Month 20XX, 1 9 On the Maximal Error of Spectral Approximation of Graph Bisection John C. Urschel a,b,c, and Ludmil T. Zikatanov c,d, a Baltimore
More informationUnsupervised Image Segmentation Using Comparative Reasoning and Random Walks
Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks Anuva Kulkarni Carnegie Mellon University Filipe Condessa Carnegie Mellon, IST-University of Lisbon Jelena Kovacevic Carnegie
More informationThe weighted spectral distribution; A graph metric with applications. Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge.
The weighted spectral distribution; A graph metric with applications. Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge. A graph metric: motivation. Graph theory statistical modelling Data
More informationLink Prediction. Eman Badr Mohammed Saquib Akmal Khan
Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling
More informationSparsification by Effective Resistance Sampling
Spectral raph Theory Lecture 17 Sparsification by Effective Resistance Sampling Daniel A. Spielman November 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened
More informationGraph Metrics and Dimension Reduction
Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationMixtures of Gaussians with Sparse Structure
Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or
More informationRandom Matrices: Invertibility, Structure, and Applications
Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationSingle-tree GMM training
Single-tree GMM training Ryan R. Curtin May 27, 2015 1 Introduction In this short document, we derive a tree-independent single-tree algorithm for Gaussian mixture model training, based on a technique
More informationDiffusion and random walks on graphs
Diffusion and random walks on graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural
More informationLimits of Spectral Clustering
Limits of Spectral Clustering Ulrike von Luxburg and Olivier Bousquet Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tübingen, Germany {ulrike.luxburg,olivier.bousquet}@tuebingen.mpg.de
More informationStatistical and Computational Analysis of Locality Preserving Projection
Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637
More informationSpectral Clustering. Guokun Lai 2016/10
Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph
More informationSpectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms
A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an
More informationLinear Spectral Hashing
Linear Spectral Hashing Zalán Bodó and Lehel Csató Babeş Bolyai University - Faculty of Mathematics and Computer Science Kogălniceanu 1., 484 Cluj-Napoca - Romania Abstract. assigns binary hash keys to
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve
More informationRobust Motion Segmentation by Spectral Clustering
Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationAn Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication
More informationApplication of Clustering to Earth Science Data: Progress and Challenges
Application of Clustering to Earth Science Data: Progress and Challenges Michael Steinbach Shyam Boriah Vipin Kumar University of Minnesota Pang-Ning Tan Michigan State University Christopher Potter NASA
More informationSpectral Bandits for Smooth Graph Functions with Applications in Recommender Systems
Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Tomáš Kocák SequeL team INRIA Lille France Michal Valko SequeL team INRIA Lille France Rémi Munos SequeL team, INRIA
More informationLecture: Modeling graphs with electrical networks
Stat260/CS294: Spectral Graph Methods Lecture 16-03/17/2015 Lecture: Modeling graphs with electrical networks Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.
More informationRobust Laplacian Eigenmaps Using Global Information
Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More information