Structure based Data De-anonymization of Social Networks and Mobility Traces

Size: px
Start display at page:

Download "Structure based Data De-anonymization of Social Networks and Mobility Traces"

Transcription

1 Structure based Data De-anonymization of Social Networks and Mobility Traces Shouling Ji 1, Weiqing Li 1, Mudhakar Srivatsa 2, Jing S. He 3, and Raheem Beyah 1 Georgia Institute of Technology 1, IBM T. J. Watson Research Center 2, KSU 3 {sji, wli64}@gatech.edu, msrivats@us.ibm.com, jhe4@kennesaw.edu, rbeyah@ece.gatech.edu Abstract. We present a novel de-anonymization attack on mobility trace data and social data. First, we design an Unified Similarity (US) measurement, based on which we present a US based De-Anonymization (DA) framework which iteratively de-anonymizes data with an accuracy guarantee. Then, to de-anonymize data without the knowledge of the overlap size between the anonymized data and the auxiliary data, we generalize DA to an Adaptive De-Anonymization (ADA) framework. Finally, we examine DA/ADA on mobility traces and social data sets. 1 Introduction Social networking is a fast-growing business sector nowadays. To protect users privacy, social network owners usually anonymize data by removing Personally Identifiable Information (PII) before releasing the data to the public. However, this data anonymization is vulnerable to a new social auxiliary information based data de-anonymization attack [1][2][3]. For example, just recently, it was reported that poorly anonymized logs revealed New York City cab drivers detailed whereabouts (June 23, 2014) [5]. A few de-anonymization attacks have been designed for social data [1][2] or mobility trace data [3]. In [1], Backstrom et al. proposed active and passive attacks on social data. Since the proposed active attack relies on sybil nodes to obtain auxiliary information before social data release, it is not practical as analyzed in [2]. For the passive attack designed in [1], it is workable however not scalable [2]. In [2], Narayanan and Shmatikov showed a de-anonymization attack on social network data which can be modeled by directed graphs (where direction information carried by data can be viewed as free auxiliary information for adversaries). In [3], Srivatsa and Hicks presented the first de-anonymization attack to mobility traces by using social networks as a side-channel. Our work improves existing works in some or all of the following aspects. First, we significantly improve the de-anonymizaiton accuracy and decrease the computational complexity by proposing a novel Core Matching Subgraphs (CMS) based adaptive de-anonymization strategy. Second, besides utilizing nodes local property, we incorporate nodes global property into de-anonymization without incurring

2 2 Authors Suppressed Due to Excessive Length high computational complexity. Furthermore, we also define and apply two new similarity measurements in the proposed de-anonymization technique. Finally, the de-anonymization algorithm presented in this work is a much more general attack framework. It can be applied to both mobility trace data and social data, directed and undirected data graphs, and weighted and unweighted data sets. We give the detailed analysis and remarks in the related work (due to space limitation, we put the related work in the Technical Report [17] of this paper). In summary, our main contributions are as follows 1. (i) We define three de-anonymization metrics, namely structural similarity, relative distance similarity, and inheritance similarity, (ii) Toward effective de-anonymization, we define a Unified Similarity (US) measurement by collectively considering the defined structural similarity, relative distance similarity, and inheritance similarity. Subsequently, we propose a US based De-Anonymization (DA) framework, by which we iteratively de-anonymize the anonymized data with an accuracy guarantee provided by a de-anonymization threshold and a mapping control factor. (iii) To de-anonymize data without the knowledge on the overlap size between the anonymized data and the auxiliary data, we generalize DA to an Adaptive De-Anonymization (ADA) framework. (iv) We apply the presented de-anonymization framework to mobility traces and social data sets. The experimental results demonstrate that the presented de-anonymization attack is very effective and robust. For instance, 93.2% of the users in Infocom06 [13] can be successfully de-anonymized given one seed mapping, and 58% of the users in Google+ can be de-anonymized given five seed mappings. 2 Preliminaries and Model Anonymized Data Graph. In this paper, we consider anonymized data which can be modeled by an undirected graph 2, denoted by G a = (V a, E a, W a ), where V a = {i i is a node} is the node set (e.g., users in an anonymized Google+ graph [10]), E a = {li,j a i, j V a, and there is a tie between i and j} is the set of all the links existing between any two nodes in V a (a link could be a friend relationship such as in Google+ [10]), and W a = {wi,j a i, j V a, li,j a E a, wi,j a is a real number} is the set of possible weights associated with links in E a (e.g., in a coauthor graph, the weight of a coauthor relationship could be the number of coauthored papers). If G a is an unweighted graph, we simply define wi,j a = 1 for each link la i,j Ea. For i V a, we define its neighbor set as N a (i) = {j V a lij a Ea }. Then, a i = N a (i) represents the number of neighbors of i in G a. For i, j V a, let p a (i, j) be a shortest path from i to j in G a and p a (i, j) be the number of links on 1 Due to the space limitation, we put more discussion and experimental results in the Technical Report [17] of this paper. 2 Note that, the de-anonymization algorithm designed in this paper can also be applied to directed graphs directly by overlooking the direction information on edges, or by incorporating the edge-direction based de-anonymizatoin heuristic in [2] which could obtain better accuracy.

3 Structure based Data De-anonymization 3 p a (i, j) (the number of links passed from i to j through p a (i, j)). Then, we define P a i,j = {pa (i, j)} the set of all the shortest pathes between i and j. Furthermore, we define the diameter of G a as D a = max{ p a (i, j) i, j V a, p a (i, j) P a i,j }, i.e., the length of the longest shortest path in G a. Auxiliary Data Graph. As in [2][3][4], we assume the auxiliary data is the information crawled in current online social networks, e.g., the follow relationships on Twitter [2], the friend relationships on Facebook [3], etc. Furthermore, similar as the anonymized data, the auxiliary data can also be modeled as an undirected graph G u = (V u, E u, W u ), where V u is the node set, E u is set of all the links (relationships) among the nodes in V u, and W u is the set of possible weights associated with the links in E u. As the definitions on the anonymized graph G a, we can define the neighborhood of i V u as N u (i), the shortest path set between i V u and j V u as P u (i, j) = {p u (i, j)}, and the diameter of G u as D u = max{ p u (i, j) i, j V u, p u (i, j) P u (i, j)}. In addition, we assume G a and G u are connected. Note that this is not a limitation of our scheme. The designed de-anonymization attack is also applicable to the case where G a or G u is not connected. We will discuss this in Section 4. Attack Model. Our de-anonymization objective is to map the nodes in the anonymized graph G a to the nodes in the auxiliary graph G u as accurate as possible. Formally, let γ(v) be the objective reality of v G a in the physical world. Then, an ideal de-anonymization can be represented by mapping Φ : G a G u, such that for v G a, Φ(v) = v if v = Φ(v) V u and Φ(v) = if Φ(v) / V u, where is a special not existing indicator in the auxiliary data graph. Now, let M = {(v 1, v 1), (v 2, v 2),, (v n, v n)} be the outcome of a deanonymization attack such that v i V a, v i = V a, n = V a (i = 1, 2,, n) and v i = Φ(v i), v i V u { } (i = 1, 2,, n). Then, the de-anonymization on v i is said to be successful if Φ(v i ) = γ(v i ) when γ(v i ) V u or Φ(v i ) = when γ(v i ) / V u ; and failure if Φ(v i ) {u u V u, u γ(v i )} { } when γ(v i ) V u or Φ(v i ) when γ(v i ) / V u. In this paper, we are aiming to design a de-anonymization attack with a high success rate (accuracy). 3 De-anonymization From a macroscopic view, the designed de-anonymization attack framework consists of two phases: seed selection and mapping propagation. In the seed selection phase, we identify a small number of seed mappings from the anonymized graph G a to the auxiliary graph G u serving as landmarks to bootstrap the deanonymization. In the mapping propagation phase, we de-anonymize G a through synthetically exploiting multiple similarity measurements. Seed Selection and Mapping Spanning. Seed selection is possible in our de-anonymization framework because of three reasons. The first reason is the common availability of huge amounts of social data, which is an open and rich source for obtaining a small number of seeds. For instance, the data published for academic and government data mining may also release some auxiliary information [4]. The second reason is the existence of multiple effective channels to obtain

4 4 Authors Suppressed Due to Excessive Length a small number of seed mappings (actually, we can obtain much richer auxiliary information), e.g., data leakage [2][4], third party applications [2], etc. The third reason is that a small number of seed mappings is sufficiently helpful (or e- nough depends on the required accuracy) to our de-anonymization framework. As shown in our experiments, a small number of seed mappings (sometimes even one seed mapping) are sufficient to achieve highly accurate de-anoymization. In our de-anonymization framework, we can select a small number of seed mappings by employing multiple seed selection strategies [1][2][3][4] individually or collaboratively, e.g., launching a small scale sybil attack [1][2], compromising a small number of nodes [1][2][3], third party applications [6][7][8], etc. Since seed selection is not our primary contribution in this paper, we assume we have identified κ seed mappings by exploiting the aforementioned strategies individually or collaboratively, denoted by M s = {(s 1, s 1), (s 2, s 2),, (s κ, s κ)}, where s i V a, s i V u, and s i = Φ(s i). In the mapping propagation phase, we start with the seed mapping M s and propagate the mapping (de-anonymization) to the entire G a iteratively. Let M 0 = M s be the initial mapping set and M k (k = 1, 2, ) be the mapping set after the k-th iteration. To facilitate our discussion, we first define some terminologies as follows. Let Mk a M = k {v i (v i, v i ) M k} and Mk u M = k {v i (v i, v i ) M k} \ { i=1 } be the sets of nodes that have been mapped until iteration k in G a and G u, respectively. Then, we define the 1-hop mapping spanning set of Mk a as Λ 1 (Mk a) = {v j V a v j / Mk a and v i Mk a s.t. v j N a (v i )}, i.e., Λ 1 (Mk a) denotes the set of nodes in G a that have some neighbor been mapped and themselves not been mapped yet. To be general, we can also define the δ- hop mapping spanning set of Mk a as Λδ (Mk a) = {v j V a v j / Mk a and v i Mk a s.t. pa (v i, v j ) δ}, i.e., Λ δ (Mk a) denotes the set of nodes in Ga that are at most δ hops away from some node been mapped and themselves not been mapped yet. Here, δ(δ = 1, 2, ) is called the spanning factor in the mapping propagation phase of the proposed de-anonymization framework. Similarly, we can define the 1-hop mapping spanning set and δ-hop mapping spanning set for Mk u as Λ1 (Mk u) = {v j V u v j / Mk u and v i M k u s.t. v j N u (v i )} and Λ δ (Mk u) = {v j V u v j / Mk u and v i Mk u s.t. pu (v i, v j ) δ}, respectively. Based on the defined δ-hop mapping sets Λ δ (Mk a) and Λδ (Mk u ), we try to seek a mapping Φ which maps the anonymized nodes in Λ δ (Mk a ) to some nodes in Λ δ (Mk u ) { } iteratively in the mapping propagation phase of our de-anonymization framework. Structural Similarity. Since both anonymized data and the auxiliary data can be modeled by graphs, the structural/topological characteristics could be a reference for coarse granularity (high level) de-anonymization. Here, coarse granularity de-anonymization implies for an anonymized node v V a, we deanonymize it by mapping it to some nodes {v v V u { } and v in G u is structurally similar to v in G a } even the ideal one-to-one mapping cannot be achieved. Structural characteristics based coarse granularity de-anonymization i=1

5 Structure based Data De-anonymization 5 is meaningful since we can employ further techniques to refine the coarse granularity de-anonymization and finally de-anonymize v exactly. In graph theory, the concept of centrality to measure the topological importance and characteristic of a node within a graph is often used. In this paper, we employ three centrality measurements to capture the topological property of a node in G a or G u, namely degree centrality, closeness centrality, and betweenness centrality. In the case that the considered data is modeled by a weighted graph, we also define the weighted version of the three centrality measurements. Furthermore, to demonstrate the aforementioned topological properties, we will employ an example data set (St Andrews [11]), which consists of a mobility trace data set of 27 users and a Facebook network of the same 27 users. The mobility trace data set has 18,241 WiFi records. We use the method in [3] to construct an anonymized graph on the 27 users based on the mobility trace data and take the Facebook network as the auxiliary data. Degree Centrality and Weighted Degree Centrality. The degree centrality is defined as the number of ties that a node has in a graph. For instance, in the considered anonymized data graph, the degree centrality of v V a is defined as d v = a v = N a (v). We calculate the degree centrality of the nodes in St Andrews and their counterparts in Facebook, and the result is shown in Fig.1 (a). From Fig.1 (a), we observe that the degree centrality distributions of the anonymized graph and auxiliary graph are similar, which implies degree centrality can be used for de-anonymization. On the other hand, multiple nodes in both graphs may have similar degree centrality, which suggests that degree centrality can be used for coarse granularity de-anonymization. When the data being considered is modeled by a weighted graph, the weights on links provide extra information in characterizing the centrality of a node. In this case, the degree centrality defined for an unweighted graph cannot properly reflect a node s structural importance [9]. To consider both the number of links associated with a node and the weights on these links, we define the weighted degree centrality for v V a as wd v = a u N v( a (v) ) α, where α is a positive a v tuning parameter that can be set according to the research setting and data [9]. Basically, when 0 α 1, high degree is considered more important, whereas when α 1, weight is considered more important. Similarly, we can define the w u weighted degree centrality for v V u as wd v = u v ( u N u (v v,u ) ) α. u v Closeness Centrality and Weighted Closeness Centrality. From the definition of degree centrality, it indicates the local property of a node since only the adjacent links are considered. To fully characterize a node s topological importance, some centrality measurements defined from a global view are also important and useful. One manner to count a node s global structural importance is by closeness centrality, which measures how close a node is to other nodes in a graph and is defined as the ratio between n 1 and the sum of its distances to all other nodes. In the definition, n is the number of nodes and distance is the length in terms of hops from a node to another node in a graph. Formally, for w a v,u

6 6 Authors Suppressed Due to Excessive Length Degree Centrality St Andrews Facebook Node ID (a) Degree centrality Closeness Centrality St Andrews Facebook Node ID (b) Closeness centrality Betweenness Centrality St Andrews Facebook Node ID (c) Betweenness centrality Structural Similarity Score Counterpart Min Max Avg Node ID Relative Distance Similarity Score Counterpart Min 5 Max 0 Avg Node ID Counterpart Min Max Avg Node ID (d) s S(v, v ) (e) s D(v, v ) (f) s I(v, v ) Fig. 1. Structural/relative distance/inheritance similarity. v V a, its closeness centrality c v is defined as c v = Inheritance Similarity Score V a 1 u V a,u v p a (v,u). Similarly, the closeness centrality c v of v V u V is defined as c v = u 1 p u (v,u ). u V u,u v Fig.1 (b) demonstrates the closeness centrality score of the nodes in St Andrews and their counterparts in the corresponding social graph Facebook. From Fig.1 (b), the closeness centrality distribution of nodes in the anonymized graph generally agrees with that in the auxiliary graph, which suggests that closeness centrality can be a measurement for de-anonymization. In the case that the data being considered is modeled by a weighted graph, we generalize the weighted closeness centrality for v V a and v V u V as wc v = a 1 wc v = V u 1 p a u V a w (v,u) and,u v p u u V u,u v w (v,u ), respectively, where pa w(, )/p u w(, ) is the shortest path between two nodes in a weighted graph. Betweenness Centrality and Weighted Betweenness Centrality. Besides closeness centrality, betweenness centrality is another measure indicating a node s global structural importance within a graph, which quantifies the number of times a node acts as a bridge (intermediate node) along the shortest path between two other nodes. Formally, for v V a, its betweenness centrality b v in G a 2 is defined as b v = ( V a 1)( V a 2) σ a xy (v) σ, where x, y V a, xy a x v y σxy a = P a (x, y) is the number of all the shortest paths between x and y in G a, and σxy(v) a = {p a (x, y) P a (x, y) v is an intermediate node on path p a (x, y)} is the number of shortest paths between x and y in G a that v lies on. Similarly, the betweenness centrality b v of v V u in G u is defined as 2 b v = ( V u 1)( V u 2) x v y σ u x y (v ) σ u x y.

7 Structure based Data De-anonymization 7 According to the definition, we obtain the betweenness centrality of nodes in St Andrews as shown in Fig.1 (c). From Fig.1 (c), the nodes in G a and their counterparts in G u agree highly on betweenness centrality. Consequently, betweenness centrality can also be employed in our de-anonymization framework for distinguishing mappings. For the case that the considering data is modeled as a weighted graph, we define the weighted betweenness centrality for v V a and v V u 2 as wb v = ( V a 1)( V a 2) σ wa xy (v) and wb v = 2 ( V u 1)( V u 2) x v y σ wu x y (v ) σ wu x y x v y, respectively, where σ wa xy σ wa xy and σ wa(v) xy (respectively, σx wu y and ) σwa(v x y ) are the number of shortest paths between x and y (respectively, x and y ) and the number of shortest paths between x and y (respectively, x and y ) passing v (respectively, v ) in the weighted graph G a (respectively, G u ). Structural Similarity. From the analysis on real data sets, the local and global structural characteristics carried by degree, closeness, and betweenness centralities of nodes can guide our de-anonymization framework design. Following this direction, to consider and utilize nodes structural property integrally, we define a unified structural measurement, namely structural similarity, to jointly count two nodes both local and global topological properties. First, for v V a and v V u, we define two structural characteristic vectors S a (v) and S u (v ) respectively in terms of their (weighted) degree, closeness, and betweenness centralities as follows: S a (v) = [d v, c v, b v, wd v, wc v, wb v ] and S u (v ) = [d v, c v, b v, wd v, wc v, wb v ]. In S a (v), if G a is unweighted, we set wd v = wc v = wb v = 0; otherwise, we first count d v, c v, and b v by assuming G a is unweighted, and then count wd v, wc v, and wb v in the weighted G a. We also apply the same method to obtain S u (v ) in G u. Based on S a (v) and S u (v ), we define the structural similarity between v V a and v V u, denoted by s S (v, v ), as the cosine similarity between S a (v) and S u (v ), i.e., s S (v, v ) = Sa (v) S u (v ) S a (v) S u (v ), where is the dot product and is the magnitude of a vector. The structural similarity between the nodes in St Andrews and its auxiliary network Facebook is shown in Fig.1 (d), where Counterpart represents s S (v, v = γ(v)) indicating the structural similarity between v V a and its objective reality γ(v) in G u, Min represents min{s S (v, x ) x V u, x γ(v)}, Max represents max{s S (v, x ) x V u, x γ(v)}, and Avg represents s S (v, x ). From Fig.1 (d), we have the following two basic 1 V u 1 x V u,x γ(v) observations. (i) For some nodes with distinguished structural characteristics, e.g., nodes 2, 16, 24, they agree with their counterparts and disagree with other nodes in the auxiliary graphs significantly. Consequently, this suggests that these nodes can be de-anonymized even just based on their structural characteristics. In addition, this confirms that structural properties can be employed in de-anonymization attacks. (ii) For the nodes with indistinctive structural similarities, e.g., nodes 7, 10, 22, 26, exact node mapping relying on structural property alone is difficult or impossible to achieve from the view of graph the-

8 8 Authors Suppressed Due to Excessive Length ory. Fortunately, even if this is true, structural characteristics can also help us to differentiate these indistinctive nodes from most of the other nodes. Hence, structural similarity based coarse granularity de-anonymization is practical. Relative Distance Similarity. In the first phase, we select an initial seed mapping M 0 = M s = {(s 1, s 1), (s 2, s 2),, (s κ, s κ)}. This apriori knowledge can be used to conduct more confident ratiocination in de-anonymization. Therefore, for v V a \ M0 a, we define its relative distance vector, denoted by D a (v) to the seeds in M0 a = {s 1, s 2,, s κ } as D a (v) = [D1(v), a D2(v), a, Dκ(v)], a where Di a(v) = pa (v,s i ) D is the normalized relative distance between v and seed a s i. Similarly, based on the initial seed set M0 u = {s 1, s 2,, s κ} in G u, we can define the relative distance vector for v V u \ M0 u to the seeds in M0 u as D u (v ) = [D1 u (v ), D2 u (v ),, Dκ(v u )], where Di u(v ) = pu (v,s i ) D is the normalized relative distance between v and seed s i. Again, we can define the relative dis- u tance similarity between v V a \M0 a and v V u \M0 u, denoted by s D (v, v ), as the cosine similarity between D a (v) and D u (v ), i.e., s D (v, v ) = Da (v) D u (v ) D a (v) D u (v ). For St Andrews/Facebook, by assuming M s = {(i, i) i = 1, 2,, 6} (which implies M0 a = M0 u = {1, 2, 3, 4, 5, 6}), we can obtain the relative distance similarity scores between the nodes in V a \ M0 a and the nodes in V u \ M0 u as shown in Fig.1 (e). From Fig.1 (e), we can observe the following facts. (i) Some anonymized nodes (which may be indistinctive with respect to structural similarity), e.g., nodes 14, 19, 23, highly agree with their counterparts and meanwhile disagree with other nodes in the auxiliary graph, which suggests that they can be de-anonymized successfully with a high probability by employing the relative distance similarity based metric. (ii) For some nodes, e.g., nodes 11, 21, 26, 27, they are indistinctive on the relative distance similarity with respect to the initial seed selection {1, 2, 3, 4, 5, 6}. To distinguish them, extra effort is expected, e.g., by utilizing structural similarity collaboratively, employing another seed s- election, etc. (ii) The nodes that are significantly distinguishable with respect to structural similarity may be indistinctive with respect to relative distance similarity, and vice versa. This inspires us to design a proper and effective multimeasurement based de-anonymization framework. Inheritance Similarity. Besides the initial seed mapping, the de-anonymized nodes during each iteration, i.e., M k, could provide further knowledge when de-anonymize Λ δ (Mk a). Therefore, for v Λδ (Mk a) and v Λ δ (Mk u ), we define the knowledge provided by the currently mapped results as the inheritance similarity, denoted by s I (v, v ). Formally, s I (v, v ) can be quantified as s I (v, v C ) = N k (v,v ) (1 a v u v max{ a v, u v } ) s(x, x ) if N k (v, v ), (x,x ) N k (v,v ) and s I (v, v ) = 0, otherwise, where C (0, 1) is a constant value representing the similarity loss exponent, N k (v, v ) = (N a (v) N u (v )) M k = {(x, x ) x N a (v), x N u (v ), (x, x ) M k } is the set of mapped pairs between N a (v) and N u (v ) till iteration k, and s(x, x ) [0, 1] is the overall similarity score between x and x which is formally defined in the following subsection. From the definition of s I (v, v ), we can see that (i) if two nodes have more common neighbors which have been mapped, then their inheritance similarity

9 Structure based Data De-anonymization 9 score is high; (ii) we also count the degree similarity in defining s I (v, v ). If the degree difference between v and v is small, then a large weight is given to the inheritance similarity; otherwise, a small weight is given; and (iii) we involve the similarity loss in counting s I (v, v ), which implies the inheritance similarity is decreasing with the distance increasing (iteration increasing) between (v, v ) and the original seed mapping. Now, for St Andrews/Facebook, if we assume half of the nodes have been mapped (the first half according to the ID increasing order), then the inheritance similarity between the rest of the nodes in the anonymized graph and the auxiliary graph is shown in Fig.1 (f). From the result, we can observe that under the half number of nodes been mapped assumption, some nodes, e.g., nodes 16, 19, 24, agree with their counterparts and meanwhile disagree with all the other nodes significantly in the auxiliary graph, which implies that they are potentially easier to be de-anonymized when inheritance similarity is taken as a metric. Note that, in Fig.1 (f), we just randomly assume that the known mapping nodes are the first half nodes in the anonymized graph and auxiliary graph. Actually, the accuracy performance of the inheritance similarity measurement could be improved. This is because there are no necessary correlations among the randomly chosen mapping nodes in Fig.1 (f). Nevertheless, in our de-anonymization framework, the obtained mappings in one iteration depend on the mappings in the previous iteration. This strong correlation among mapped nodes allows for use of the inheritance similarity in practical de-anonymizaiton. De-anonymization Algorithm. From the aforementioned discussion, we find that the differentiability of anonymized nodes is different with respect to different similarity measurements. For instance, some nodes have distinctive topological characteristics, e.g., node 16 in St Andrew, which implies they can be potentially de-anonymized solely based on the structural similarity. On the other hand, for some nodes, due to lacking of distinct topological characteristics, the structural similarity based method can only achieve coarse granularity deanonymization. Nevertheless and fortunately (from the view of adversary), they may become significantly distinguishable with the knowledge of a small amount of auxiliary information, e.g., nodes 14, 19, and 23 in St Andrews are potentially easy to de-anonymize based on relative distance similarity. In summary, the analysis on real data sets suggests to us to define a unified measurement to properly involve multiple similarity metrics for effective de-anonymization. To this end, we define a Unified Similarity (US) measurement by considering the structural similarity, relative distance similarity, and inheritance similarity synthetically for v Λ δ (Mk a) and v Λ δ (Mk u ) in the k-th iteration of our deanonymization framework as s(v, v ) = c S s S (v, v )+c D s D (v, v )+c I s I (v, v ), where c S, c D, c I [0, 1] are constant values indicating the weights of structural similarity, relative distance similarity, and inheritance similarity, respectively, and c S + c D + c I = 1. In addition, we define s(v, v ) = 1 if (v, v ) M s. Now, we are ready to present our US based De-Anonymization (DA) framework, which is shown in Algorithm 1.

10 10 Authors Suppressed Due to Excessive Length Algorithm 1: US based De-Anonymization (DA) 1 M 0 = M s, k = 0, flag = true; 2 while flag = true do 3 calculate Λ δ (M a k ) and Λδ (M u k ); 4 if Λ δ (M a k ) = or Λδ (M u k ) =, output M k, break; 5 for v Λ δ (M a k ) and v Λ δ (M u k ), calculate s(v, v ); 6 construct a weighted bipartite graph B k = (Λ δ (M a k ) Λδ (M u k ), Eb k, W b k ); 7 find a maximum weighted bipartite matching M of B k ; 8 for every (x, x ) M, if s(x, x ) < θ, M = M \ {(x, x )}; 9 let K = max{1, ϵ M } and for (x, x ) M, if s(x, x ) is not the Top-K mapping score in M then 10 M = M \ {(x, x )}; 11 if M =, output M k and break; 12 M k+1 = M k M, k++; In Algorithm 1, B k = (Λ δ (Mk a) Λδ (Mk u), Eb k, W k b ) is a weighted bipartite graph defined on the intended de-anonymizing nodes during the k-th iteration, where Ek b = {lb v,v v Λδ (Mk a), v Λ δ (Mk u)}, and W k b = {wb v,v } is the set of all the possible weights on the links in Ek b. Here, for (v, v ) Ek b, the weight on this link is defined as the US score between the associated two nodes, i.e., wv,v b = s(v, v ). Parameter θ is a constant value named de-anonymization threshold to decide whether a node mapping is accepted or not. Parameter ϵ (0, 1] is the mapping control factor, which is used to limit the maximum number mappings generated during each iteration. By ϵ, even if there are many mappings with similarity score greater than the de-anonymization threshold, we only keep the K = max{1, ϵ M } more confident mappings. We give further explanation on the idea of Algorithm DA as follows. The de-anonymization is bootstrapped with an initial seed mapping and starts the iteration procedure. During each iteration, the intended de-anonymizing nodes are calculated first based on the mappings obtained in the previous iteration followed by calculating the US scores between nodes in Λ δ (Mk a ) and nodes in Λ δ (Mk u ). Subsequently, based on the obtained US scores, a weighted bipartite graph is constructed between nodes in Λ δ (Mk a) and nodes in Λδ (Mk u ). Then, we compute a maximum weighted bipartite matching M on the constructed bipartite graph. To improve the de-anonymization accuracy, we apply two important rules to refine M : (i) by defining a de-anonymization threshold θ, we eliminate the mappings with low US scores in M. This is because we are not confident to take the mappings with low US scores (< θ) as correct de-anonymizaiton, and more improtantly, they may be more accurately de-anonymized in the following iterations by utilizing confident mapping information obtained in this iteration (this can be achieved since we involve inheritance similarity in the US definition); and (ii) we introduce a mapping control factor ϵ, or K equivalently, to limit the maximum number of mappings been accepted as correct de-anonymization. During each iteration, only K mappings with highest US scores will be taken as correct de-anonymization with confidence even if more mappings having US scores greater than the de-anonymizaiton threshold. This strategy has two

11 Structure based Data De-anonymization 11 benefits. On one hand, only highly confident mappings are kept, which could improve the de-anonymization accuracy. On the other hand, for the mappings been rejected, again, they may be better re-de-anonymized in the following iterations by utilizing the more confident knowledge of the Top-K mappings from this iteration. Time and Space Complexities Analysis. Let n = max{ V a, V u } and m = max{ E a, E u }. Then, according to combinatorial analysis, Algorithm 1 s time complexity is O(n 2 log n+mn) and space complexity is O(min{n 2, m+n}). 4 Generalized Scalable De-anonymization De-anonymization on Data Sets without Knowledge of Overlap Size. One predicament in practical de-anonymization, which is omitted in existing deanonymization attacks, is that we do not actually know how large the overlap between the anonymized data and the auxiliary data even we have a lot of auxiliary information available. Therefore, it is unadvisable to do de-anonymization based on the entire anonymized and auxiliary graphs directly, which might cause low de-anonymization accuracy as well as high computational overhead. To address the aforementioned predicament, guarantee the accuracy of DA, and simultaneously improve de-anonymization efficiency and scalability, we extend DA to an Adaptive De-Anonymization framework, denoted by ADA. ADA adaptively de-anonymizes G a starting from a Core Matching Subgraph (CMS), which is formally defined as follows. Let M s be the initial seed mapping between the anonymized graph G a and the auxiliary graph G u. Furthermore, define Vs a = {v v lies on p a (x, y) P a (x, y)}, i.e., Vs a is the union of all the x,y M a 0 nodes on the shortest paths among all the seeds in G a, and Vc a = Vs a Λ δ (Vs a ), i.e., Vc a is the union of Vs a and the δ-hop mapping spanning set of Vs a. Then, we define the initial CMS on G a as the subgraph of G a on Vc a, i.e., G a c = G a [Vc a ]. Similarly, we can define V u {v v lies on p u (x, y ) P u (x, y )} and V u c = V u s s = x,y M0 u Λ δ (V u s ). Then, the initial CMS on G u is G u c = G a [V u c ]. The CMS is generally defined for two purposes. First, we can employ a CMS to adaptively and roughly estimate the overlap between G a and G u in terms of the seed mapping information. On the other hand, we propose to start the deanonymization from the CMSs, by which the de-anonymization is smartly limited to start from two small subgraphs with more information confidence, and thus we could improve the de-anonymization accuracy and reduce the computational overhead. Now, based on CMS, we discuss ADA as shown in Algorithm 2. In Algorithm 2, µ is the adaptive factor which controls the spanning size of the CMS during each adaptive iteration. The basic idea of ADA is as follows. We start the de-anonymization from CMSs G a c and G u c by running DA. If DA is ended with Λ δ (Mk a) = or Λδ (Mk u) =, then the actual overlap between Ga and G u might be larger than G a c /G u c since more nodes could be mapped. Therefore, we enlarge the previous considering CMS G a c /G u c by involving more n-

12 12 Authors Suppressed Due to Excessive Length Algorithm 2: Adaptive De-Anonymization (ADA) 1 generate G a c and Gu c and run DA for Ga c and Gu c ; 2 if Step 1 is ended on the condition that Λ δ (M a k ) = or Λδ (M u k ) = then 3 if Λ µ (V a c ) = or Λµ (V u c ) =, return; 4 V a c = V a c Λµ (V a c ), V a c = V a c Λµ (V a c ); 5 G a c = Ga [V a c ], Gu c = Gu [V u c ]; 6 go to Step 1 to de-anonymize unmapped nodes in updated G a c and Gu c ; i=1 j=1 odes in Λ µ (Vc a )/Λ µ (Vc u ) and repeat the de-anonymization for unmapped nodes. Same as DA, the time and space complexities of ADA are O(n 2 log n + mn) and O(min{n 2, m + n}), respectively. Disconnected Data Sets. In reality, when we employ a graph G a /G u to model the anonymized/auxiliary data, G a /G u might be not connected. In this case, G a and G u can be represented by the union of connected components as m G a i and n G u j respectively, where Ga i and G u j are some connected components. Now, when defining the structural similarity, relative distance similarity, or inheritance similarity, we change the context from G a /G u to components G a i /Gu j. Then, we can apply DA/ADA to conduct de-anonymization. 5 Experiments In this section, we examine the performance of the presented de-anonymization attack on real data sets 3. In each group of experiments, we specify the employed setup and provide comprehensive analysis. The default settings are: α = 1.5, C =, c S =, c D =, c I =, θ =, δ {1, 2}, µ {1, 2, 3}, ϵ = and seed number = 5. Data Sets. In this paper, we employ six well known data sets to examine the effectiveness of the designed de-anonymization framework 4 : St Andrews/Facebook [11][3], Infocom06/DBLP [13][3], Smallbule/Facebook [12][3], ArnetMiner [14], Google+ [10], and Facebook [15]. St Andrews, Infocom06, and Smallbule are three mobility trace data sets. An overview of the three mobility traces is shown in Table 1. We employ the same techniques as in [3] to preprocess the three mobility trace data sets to obtain three anonymized data graphs. To de-anonymize the three anonymized mobility data traces, we employ three auxiliary social network data sets [3] associated with these three mobility traces. For St Andrews, we have a Facebook data set indicating the friend relationships among the T-mote users in the trace. For Infocom06, we employ a coauthor data set consisting of 616 authors obtained from DBLP which indicates the coauthor relationships among all the attendees of INFOCOM For Smallblue, we 3 Due to the space limitation, we put the detailed experimental settings and more results in the Technical Report [17] of this paper. 4 Not that it has been shown that the classical mobility traces of the (latitude, longitude, timestamp) form can also be represented by graph models [16].

13 Structure based Data De-anonymization 13 Table 1. Mobility traces. St Andrews Infocom06 Smallblue Comm. network type WiFi Bluetooth IM Comm. nodes No Contacts No. 18, , ,665 Social network type Facebook DBLP Facebook Social nodes No have a Facebook network among 400 employees from the same enterprise as S- mallblue. Note that, the social network data sets corresponding to Infocom06 and Smallblue are supersets of them with respect to involved users. We also apply the presented de-anonymization attack to social data sets: ArnetMiner [14], Google+ [10], and Facebook [15]. ArnetMiner is an online a- cademic social network, which consists of 1,127 authors and 6,690 coauthor relationships. For each coauthor relationship, there is a weight associated with it indicating the number of coauthored papers by the two authors. As a new social network, Google+ was launched in early July We use two Google+ data sets which were created on July 19 and August 6 in 2011 [10], denoted by JUL and AUG respectively. Both JUL and AUG consist of 5,200 users as well as their profiles. In addition, there were 7,062 connections in JUL and 7,813 connections in AUG. By insight analysis [10], some connections appeared in AUG may not appear in JUL and vise versa. This is because a user may add new connections or disable existing connections. Furthermore, the two data sets are preprocessed as undirected graphs. Since we know the hand labeled ground truth of JUL and AUG, we will examine the presented de-anonymization framework by de-anonymizing JUL with AUG as auxiliary data and then de-anonymizing AUG with JUL as auxiliary data. The Facebook data set consists of 63,731 users and 1,269,502 friend relationships (links). To use this data set to examine the presented de-anonymization attack, we will preprocess it based on the known hand labeled ground truth. De-anonymize Mobility Traces. By utilizing the corresponding social networks as auxiliary information, we exploit the presented de-anonymization algorithm DA to de-anonymize the three well known mobility traces St Andrews, Infocom06, and Smallblue. The results are shown in Fig.2 (a)-(c), where DA denotes the presented US-based de-anonymization framework, and DA-SS, DA-RDS, and DA-IS represent the de-anonymization based on structural similarity solely, relative distance similarity solely, and inheritance similarity solely, respectively. From Fig.2 (a)-(c), we can see that (i) the presented de-aonymization framework is very effective even with a small amount of auxiliary information. For instance, DA can successfully de-anonymize 93.2% of the Infocom06 data just with the knowledge of one seed mapping. For St Andrews and Smallblue, DA can also achieve accuracy of 57.7% and 78.3% respectively with one seed mapping. Furthermore, DA can successfully de-anonymize all the data in St Andrews and Smallblue and 96% of the data of Smallblue with the knowledge of 7 seed mappings; and (ii) the US-based de-anonymization is much more effective and stable than structural, relative distance, or inheritance similarity solely based de-anonymization. The reason is that US tries to distinguish a node from

14 14 Authors Suppressed Due to Excessive Length Accuracy DA DA-SS DA-RDS DA-IS Seed Number Seed Number (a) St Andrews vs Facebook (b) Infocom06 vs DBLP Accuracy DA DA-SS DA-RDS DA-IS Accuracy DA DA-SS DA-RDS DA-IS Seed Number (c) Smallblue vs Facebook Accuracy St Andrews Infocom06 Smallblue Noise Accuracy 4% Noise 8% Noise 12% Noise 16% Noise 20% Noise Seed Number Accuracy De-JUL De-AUG Seed Number (d) DA with noise (e) ArnetMiner (f) Google+ Degree Distribution JUL AUG Degree Seed Number (g) Degree dist. of Google+ (h) Classification (Google+) Accuracy De-JUL-Leaf De-JUL-NonLeaf De-AUG-Leaf De-AUG-NonLeaf Fig. 2. De-anonymize mobility traces. Accuracy-10% overlap False Positive-10% overlap Accuracy-20% overlap False Positive-20% overlap Seed Number (i) Facebook multiple perspectives, which is more efficient and comprehensive. As the analysis shown in Section 3, the nodes can be easily differentiated with respect to one measurement but might be indistinguishable with respect to another measurement. Consequently, synthetically characterizing a node as in US is more powerful and stable. We also examine the robustness of the presented de-anonymization attack to noise and the result is shown in Fig.2 (d) (on the knowledge of 5 seed mappings). In the experiment, we only add noise to the anonymized data. According to the same argument in [2], the noise in the auxiliary data can be counted as noise in the anonymized data. To add p percent of noise to the anonymized data, we randomly add p 2 Ea spurious connections to and meanwhile delete p 2 Ea existing connections from the anonymized graph (a node may become isolated after the noise adding process). For instance, in Fig.2 (d), 20% of noise implies we add 10% spurious connections and delete 10% existing connections of E a from the anonymized data. From Fig.2 (d), we can see that the presented de-anonymization framework is robust to noise. Even if we change 20% of the connections in the anonymized data, the achieved accuracies on St Andrews, Infocom06, and Smallblue are still 8%, 5%, and 6%, respectively. Note

15 Structure based Data De-anonymization 15 that, when 20% of the connections have been changed, the structure of the anonymized data is significantly changed. In practical, if the anonymized data release is initially for research purposes, this structural change may make the data useless. However, by considering multiple perspectives to distinguish a node, the anonymized data can still be de-anonymized as shown in Fig.2 (d), which confirms the assertion in [2] that data set structure change may not provide effective privacy protection. De-anonymize ArnetMiner. ArnetMiner can be modeled by a weighted graph where the weight on each relationship indicates the number of coauthored papers by the two authors. To examine the de-anonymization framework, we first anaonymize ArnetMiner by adding p percent noise as explained in the previous subsection. Furthermore, for each added spurious coauthor relationship, we also randomly generate a weight in [1, A max ], where A max is the maximum weight in the original ArnetMiner graph. Then, we de-anonymize the anonymized data using the original ArnetMiner data and the result is shown in Fig.2 (e). From Fig.2 (e), we can observe that the presented de-anonymization framework is very effective on weighted data. With only knowledge of one seed mapping, more than a half (53.9%) and one-third (34.1%) of the authors can be de-anonymized even with noise levels of 4% and 20%, respectively. Furthermore, when adding 20% of noise to the anonymized data, the presented deanonymization framework achieves 71.5% accuracy if 5 seed mappings are available and 92.8% accuracy if 10 seed mappings are available; (ii) the presented de-anonymization framework is robust to noise on weighted data. When we have 10 or more seed mappings, the accuracy degradation of our de-anonymization algorithm is small even with more noise, e.g., the accuracy is degraded from 99.7% in the 4%-noise case to 96% in the 20%-noise case; and (iii) if the available number of seed mappings is 10, the knowledge brought by more seed mappings cannot improve the de-anonymization accuracy significantly. This is because the achieved accuracy on the knowledge of 10 seed mappings is already about 95%. Therefore, to de-anonymize a data set, it is not necessary to spend efforts to obtain a lot of seed mappings. As in this case, to de-anonymize most of the authors, 5 to 10 seed mappings is sufficient. De-anonymize Google+. Now, we validate the presented de-anonymization framework on the two Google+ data sets JUL and AUG. We first utilize AUG as auxiliary data to deanonymize JUL denoted by De-JUL, i.e., use future data to de-anonymize historical data, and then utilize JUL to de-anonymize AUG denoted by De-AUG, i.e., use historical data to de-anonymize future data. The results is shown in Fig.2 (f). Again, from Fig.2 (f), we can see that the presented de-anonymization framework is very effective. Just based on the knowledge of 5 seed mappings, 57.9% of the users in JUL and 61.6% of the users in AUG can be successfully deanonymized. When 10 seed mappings are available, the de-anonymization accuracy can be improved to 66.8% on JUL and 73.9% on AUG, respectively. However, we also have two other interesting observations from Fig.2 (f): (i) when the number of available seed mappings is above 10, the performance im-

16 16 Authors Suppressed Due to Excessive Length provement is not as significant as on previous data sets (e.g., mobility traces, ArnetMiner) even the de-anonymization accuracy is around 70% for JUL and 75% for AUG; and (ii) De-AUG has a better accuracy than De-JUL, which implies that the AUG data set is easier to de-anonymize than the JUL data set. To explain the two observations, we assert this is because of the structural property of the two data sets. Follow this direction, we investigate the degree distribution of JUL and AUG as shown in Fig.2 (g). From Fig.2 (g), we can see that the degree of both JUL and AUG generally follows a heavy-tailed distribution. In particular, 38.4% of the users in JUL and 34.3% of the users in AUG have degree of one, named leaf users. This is normal since Google+ was launched in early July 2011, and JUL and AUG are data sets crawled in July and August of 2011, respectively. That is also why JUL has more leaf users than AUG (a user connects more people later). Now, we argue that the leaf users cause the difficulty in improving the de-anonymization accuracy. From the perspective of graph theory, the leaf users limit not only the performance of our de-anonymization framework but also the performance of any de-anonymization algorithm. An explanatory example is as follows. Suppose v V a is successfully de-anonymized to v V u. In addition, the two neighbors x and y of v and the two neighbors x and y of v are all leaf users. Then, even x = γ(x), y = γ(y), and v has been successfully de-anonymized to v, it is still difficult to make a decision to map x (or y) to x or y since s(x, x ) s(x, y ) from the view of graph theory. Consequently, to accurately distinguish x, further knowledge such as semantic information is required. To support our argument, we take an insightful look on the experimental results. For each successfully de-anonymized user in JUL and AUG, we classify the user in terms of its degree into one of two sets: leaf user set if its degree is one or non-leaf user set if its degree is greater than one. Then, we re-calculate the de-anonymization accuracy for leaf users and non-leaf users and the results are shown in Fig.2 (h), where De-JUL-Leaf/De-AUG-Leaf represents the ratio of leaf nodes that have been successfully de-anonymized in JUL/AUG while De-JUL-NonLeaf/De-AUG-NonLeaf represents the ratio of non-leaf users that have been successfully de-anonymized in JUL/AUG. From Fig.2 (h), we can see that (i) the successful de-anonymization ratio on non-leaf users is higher than that on leaf users in JUL and AUG. This is because non-leaf users carry more structural information; and (ii) considering the results shown in Fig.2 (f), the deanonymization accuracy on non-leaf users is higher than the overall accuracy and the de-anonymization accuracy on leaf users is lower than the overall accuracy. The two observations on Fig.2 (h) confirms our argument that leaf users are more difficult than non-leaf users to de-anonymize. Furthermore, this is also why De-AUG has higher accuracy than De-JUL in Fig.2 (f). AUG is easier to de-anonymize since it has less leaf users than JUL. De-anonymize Facebook. Finally, we examine ADA on Facebook. Based on the hand labeled ground truth, we partition the data sets into two aboutequal parts utilizing the method employed in [2], and then we take one part as auxiliary data to de-anonymize the other part. When the two parts only have

17 Structure based Data De-anonymization 17 10% and 20% users in common, the achievable accuracy and the induced false positive error of ADA are shown in Fig.2 (i). As a fact, most of the existing de-anonymization attacks are not very effective for the scenario that the overlap between the anaonymized data and the auxiliary data is small or even cannot work totally. Surprisingly, for ADA, we can observe from Fig.2 (i) that (i) based on the proposed CMS, ADA can successfully de-anonymize 62.4% of the common users with false positive error of 34.1% when the overlap is 10% and 71.8% of the common users with false positive error of 25.6% when the overlap is 20% with the knowledge of just 5 seed mappings; (ii) the de-anonymization accuracy is improved to 81.3% (resp., 85.6%) and the false positive error is decreased to 16.8% (resp., 13%) when the overlap is 10% and 10 (resp., 20) seed mappings available, and the de-anonymization accuracy is improved to 87% (resp., 9%) and the false positive error is decreased to 11.6% (resp., 8.6%) when the overlap is 20% and 10 (resp., 20) seed mappings available, which demonstrates that ADA is very effective in dealing with the partial data overlap situation; and (iii) ADA has a higher de-anonymization accuracy and lower false positive error in the 20% data overlap scenario than that in the 10% data overlap scenario. This is because a larger overlap size implies a common node will carry much more similar structural information in both graphs, and thus it can be de-anonymized with higher probability and accuracy. From Fig.2 (i), we can also see that 10 seed mappings are sufficient to achieve high de-anonymization accuracy and low false positive error. Therefore, ADA is applicable with efficiency and performance guarantee in practical. 6 Conclusion In this paper, we present a novel and effective de-anonymization attack based on a Unified Similarity (US) measurement which synthetically incorporates multiple data structural factors. The experimental results demonstrate that the presented de-anonymization framework is very effective and robust to noise. Acknowledgments Mudhakar Srivatsa s research was sponsored by US Army Research laboratory and the UK Ministry of Defence and was accomplished under Agreement Number W911NF The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the US Army Research Laboratory, the U.S. Government, the UK Ministry of Defense, or the UK Government. The US and UK Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. Jing S. He s research is partly supported by the Kennesaw State University College of Science and Mathematics the Interdisciplinary Research Opportunities (IDROP) Program.

Structural Data De-anonymization: Quantification, Practice, and Implications

Structural Data De-anonymization: Quantification, Practice, and Implications Structural Data De-anonymization: Quantification, Practice, and Implications ABSTRACT Shouling Ji School of Electrical and Computer Engineering Georgia Institute of Technology sji@gatech.edu Mudhakar Srivatsa

More information

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,

More information

An Efficient reconciliation algorithm for social networks

An Efficient reconciliation algorithm for social networks An Efficient reconciliation algorithm for social networks Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models Outline Graph reconciliation

More information

Patrol: Revealing Zero-day Attack Paths through Network-wide System Object Dependencies

Patrol: Revealing Zero-day Attack Paths through Network-wide System Object Dependencies Patrol: Revealing Zero-day Attack Paths through Network-wide System Object Dependencies Jun Dai, Xiaoyan Sun, and Peng Liu College of Information Sciences and Technology Pennsylvania State University,

More information

Node Failure Localization: Theorem Proof

Node Failure Localization: Theorem Proof Node Failure Localization: Theorem Proof Liang Ma, Ting He, Ananthram Swami, Don Towsley, and Kin K. Leung IBM T. J. Watson Research Center, Yorktown, NY, USA. Email: {maliang, the}@us.ibm.com Army Research

More information

On a Conjecture of Thomassen

On a Conjecture of Thomassen On a Conjecture of Thomassen Michelle Delcourt Department of Mathematics University of Illinois Urbana, Illinois 61801, U.S.A. delcour2@illinois.edu Asaf Ferber Department of Mathematics Yale University,

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

Strongly chordal and chordal bipartite graphs are sandwich monotone

Strongly chordal and chordal bipartite graphs are sandwich monotone Strongly chordal and chordal bipartite graphs are sandwich monotone Pinar Heggernes Federico Mancini Charis Papadopoulos R. Sritharan Abstract A graph class is sandwich monotone if, for every pair of its

More information

Minimum-sized Positive Influential Node Set Selection for Social Networks: Considering Both Positive and Negative Influences

Minimum-sized Positive Influential Node Set Selection for Social Networks: Considering Both Positive and Negative Influences Minimum-sized Positive Influential Node Set Selection for Social Networks: Considering Both Positive and Negative Influences Jing (Selena) He, Shouling Ji, Xiaojing Liao, Hisham M. Haddad, Raheem Beyah

More information

Exact Algorithms for Dominating Induced Matching Based on Graph Partition

Exact Algorithms for Dominating Induced Matching Based on Graph Partition Exact Algorithms for Dominating Induced Matching Based on Graph Partition Mingyu Xiao School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu 611731,

More information

ON SOCIAL-TEMPORAL GROUP QUERY WITH ACQUAINTANCE CONSTRAINT

ON SOCIAL-TEMPORAL GROUP QUERY WITH ACQUAINTANCE CONSTRAINT 1 ON SOCIAL-TEMPORAL GROUP QUERY WITH ACQUAINTANCE CONSTRAINT D.N Yang Y.L Chen W.C Lee M.S Chen Sep 2011 Presented by Roi Fridburg AGENDA Activity Planning Social Graphs Proposed Algorithms SGSelect SGTSelect

More information

Modelling self-organizing networks

Modelling self-organizing networks Paweł Department of Mathematics, Ryerson University, Toronto, ON Cargese Fall School on Random Graphs (September 2015) Outline 1 Introduction 2 Spatial Preferred Attachment (SPA) Model 3 Future work Multidisciplinary

More information

arxiv: v2 [math.co] 7 Jan 2016

arxiv: v2 [math.co] 7 Jan 2016 Global Cycle Properties in Locally Isometric Graphs arxiv:1506.03310v2 [math.co] 7 Jan 2016 Adam Borchert, Skylar Nicol, Ortrud R. Oellermann Department of Mathematics and Statistics University of Winnipeg,

More information

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into

More information

Robust Network Codes for Unicast Connections: A Case Study

Robust Network Codes for Unicast Connections: A Case Study Robust Network Codes for Unicast Connections: A Case Study Salim Y. El Rouayheb, Alex Sprintson, and Costas Georghiades Department of Electrical and Computer Engineering Texas A&M University College Station,

More information

Bounded Budget Betweenness Centrality Game for Strategic Network Formations

Bounded Budget Betweenness Centrality Game for Strategic Network Formations Bounded Budget Betweenness Centrality Game for Strategic Network Formations Xiaohui Bei 1, Wei Chen 2, Shang-Hua Teng 3, Jialin Zhang 1, and Jiajie Zhu 4 1 Tsinghua University, {bxh08,zhanggl02}@mails.tsinghua.edu.cn

More information

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth Gregory Gutin, Mark Jones, and Magnus Wahlström Royal Holloway, University of London Egham, Surrey TW20 0EX, UK Abstract In the

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Rani M. R, Mohith Jagalmohanan, R. Subashini Binary matrices having simultaneous consecutive

More information

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search 6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)

More information

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria 12. LOCAL SEARCH gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley h ttp://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

On Threshold Models over Finite Networks

On Threshold Models over Finite Networks On Threshold Models over Finite Networks Elie M. Adam, Munther A. Dahleh, Asuman Ozdaglar Abstract We study a model for cascade effects over finite networks based on a deterministic binary linear threshold

More information

arxiv: v1 [cs.dm] 26 Apr 2010

arxiv: v1 [cs.dm] 26 Apr 2010 A Simple Polynomial Algorithm for the Longest Path Problem on Cocomparability Graphs George B. Mertzios Derek G. Corneil arxiv:1004.4560v1 [cs.dm] 26 Apr 2010 Abstract Given a graph G, the longest path

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community

More information

Geometric Steiner Trees

Geometric Steiner Trees Geometric Steiner Trees From the book: Optimal Interconnection Trees in the Plane By Marcus Brazil and Martin Zachariasen Part 3: Computational Complexity and the Steiner Tree Problem Marcus Brazil 2015

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN. Robin Thomas* Xingxing Yu**

4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN. Robin Thomas* Xingxing Yu** 4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN Robin Thomas* Xingxing Yu** School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332, USA May 1991, revised 23 October 1993. Published

More information

Approximation Algorithms for Re-optimization

Approximation Algorithms for Re-optimization Approximation Algorithms for Re-optimization DRAFT PLEASE DO NOT CITE Dean Alderucci Table of Contents 1.Introduction... 2 2.Overview of the Current State of Re-Optimization Research... 3 2.1.General Results

More information

Lower Bounds for Testing Bipartiteness in Dense Graphs

Lower Bounds for Testing Bipartiteness in Dense Graphs Lower Bounds for Testing Bipartiteness in Dense Graphs Andrej Bogdanov Luca Trevisan Abstract We consider the problem of testing bipartiteness in the adjacency matrix model. The best known algorithm, due

More information

Kristina Lerman USC Information Sciences Institute

Kristina Lerman USC Information Sciences Institute Rethinking Network Structure Kristina Lerman USC Information Sciences Institute Università della Svizzera Italiana, December 16, 2011 Measuring network structure Central nodes Community structure Strength

More information

Spatial Decision Tree: A Novel Approach to Land-Cover Classification

Spatial Decision Tree: A Novel Approach to Land-Cover Classification Spatial Decision Tree: A Novel Approach to Land-Cover Classification Zhe Jiang 1, Shashi Shekhar 1, Xun Zhou 1, Joseph Knight 2, Jennifer Corcoran 2 1 Department of Computer Science & Engineering 2 Department

More information

Interact with Strangers

Interact with Strangers Interact with Strangers RATE: Recommendation-aware Trust Evaluation in Online Social Networks Wenjun Jiang 1, 2, Jie Wu 2, and Guojun Wang 1 1. School of Information Science and Engineering, Central South

More information

arxiv: v1 [cs.ds] 2 Oct 2018

arxiv: v1 [cs.ds] 2 Oct 2018 Contracting to a Longest Path in H-Free Graphs Walter Kern 1 and Daniël Paulusma 2 1 Department of Applied Mathematics, University of Twente, The Netherlands w.kern@twente.nl 2 Department of Computer Science,

More information

Decomposition of random graphs into complete bipartite graphs

Decomposition of random graphs into complete bipartite graphs Decomposition of random graphs into complete bipartite graphs Fan Chung Xing Peng Abstract We consider the problem of partitioning the edge set of a graph G into the minimum number τg of edge-disjoint

More information

arxiv: v1 [cs.dc] 4 Oct 2018

arxiv: v1 [cs.dc] 4 Oct 2018 Distributed Reconfiguration of Maximal Independent Sets Keren Censor-Hillel 1 and Mikael Rabie 2 1 Department of Computer Science, Technion, Israel, ckeren@cs.technion.ac.il 2 Aalto University, Helsinki,

More information

The concentration of the chromatic number of random graphs

The concentration of the chromatic number of random graphs The concentration of the chromatic number of random graphs Noga Alon Michael Krivelevich Abstract We prove that for every constant δ > 0 the chromatic number of the random graph G(n, p) with p = n 1/2

More information

Information Propagation Analysis of Social Network Using the Universality of Random Matrix

Information Propagation Analysis of Social Network Using the Universality of Random Matrix Information Propagation Analysis of Social Network Using the Universality of Random Matrix Yusuke Sakumoto, Tsukasa Kameyama, Chisa Takano and Masaki Aida Tokyo Metropolitan University, 6-6 Asahigaoka,

More information

On the Complexity of the Minimum Independent Set Partition Problem

On the Complexity of the Minimum Independent Set Partition Problem On the Complexity of the Minimum Independent Set Partition Problem T-H. Hubert Chan 1, Charalampos Papamanthou 2, and Zhichao Zhao 1 1 Department of Computer Science the University of Hong Kong {hubert,zczhao}@cs.hku.hk

More information

arxiv: v1 [cs.ds] 3 Feb 2018

arxiv: v1 [cs.ds] 3 Feb 2018 A Model for Learned Bloom Filters and Related Structures Michael Mitzenmacher 1 arxiv:1802.00884v1 [cs.ds] 3 Feb 2018 Abstract Recent work has suggested enhancing Bloom filters by using a pre-filter, based

More information

The Algorithmic Aspects of the Regularity Lemma

The Algorithmic Aspects of the Regularity Lemma The Algorithmic Aspects of the Regularity Lemma N. Alon R. A. Duke H. Lefmann V. Rödl R. Yuster Abstract The Regularity Lemma of Szemerédi is a result that asserts that every graph can be partitioned in

More information

CS 224w: Problem Set 1

CS 224w: Problem Set 1 CS 224w: Problem Set 1 Tony Hyun Kim October 8, 213 1 Fighting Reticulovirus avarum 1.1 Set of nodes that will be infected We are assuming that once R. avarum infects a host, it always infects all of the

More information

CMPSCI 611 Advanced Algorithms Midterm Exam Fall 2015

CMPSCI 611 Advanced Algorithms Midterm Exam Fall 2015 NAME: CMPSCI 611 Advanced Algorithms Midterm Exam Fall 015 A. McGregor 1 October 015 DIRECTIONS: Do not turn over the page until you are told to do so. This is a closed book exam. No communicating with

More information

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada On Top-k Structural 1 Similarity Search Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China 2014/10/14 Pei

More information

THE MAXIMAL SUBGROUPS AND THE COMPLEXITY OF THE FLOW SEMIGROUP OF FINITE (DI)GRAPHS

THE MAXIMAL SUBGROUPS AND THE COMPLEXITY OF THE FLOW SEMIGROUP OF FINITE (DI)GRAPHS THE MAXIMAL SUBGROUPS AND THE COMPLEXITY OF THE FLOW SEMIGROUP OF FINITE (DI)GRAPHS GÁBOR HORVÁTH, CHRYSTOPHER L. NEHANIV, AND KÁROLY PODOSKI Dedicated to John Rhodes on the occasion of his 80th birthday.

More information

CS 350 Algorithms and Complexity

CS 350 Algorithms and Complexity 1 CS 350 Algorithms and Complexity Fall 2015 Lecture 15: Limitations of Algorithmic Power Introduction to complexity theory Andrew P. Black Department of Computer Science Portland State University Lower

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Performance Evaluation. Analyzing competitive influence maximization problems with partial information: An approximation algorithmic framework

Performance Evaluation. Analyzing competitive influence maximization problems with partial information: An approximation algorithmic framework Performance Evaluation ( ) Contents lists available at ScienceDirect Performance Evaluation journal homepage: www.elsevier.com/locate/peva Analyzing competitive influence maximization problems with partial

More information

Capacity Releasing Diffusion for Speed and Locality

Capacity Releasing Diffusion for Speed and Locality 000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 054

More information

Towards Detecting Protein Complexes from Protein Interaction Data

Towards Detecting Protein Complexes from Protein Interaction Data Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,

More information

Heuristics for The Whitehead Minimization Problem

Heuristics for The Whitehead Minimization Problem Heuristics for The Whitehead Minimization Problem R.M. Haralick, A.D. Miasnikov and A.G. Myasnikov November 11, 2004 Abstract In this paper we discuss several heuristic strategies which allow one to solve

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

k-symmetry Model: A General Framework To Achieve Identity Anonymization In Social Networks

k-symmetry Model: A General Framework To Achieve Identity Anonymization In Social Networks k-symmetry Model: A General Framework To Achieve Identity Anonymization In Social Networks Wentao Wu School of Computer Science and Technology, Fudan University, Shanghai, China 1 Introduction Social networks

More information

Determining the Diameter of Small World Networks

Determining the Diameter of Small World Networks Determining the Diameter of Small World Networks Frank W. Takes & Walter A. Kosters Leiden University, The Netherlands CIKM 2011 October 2, 2011 Glasgow, UK NWO COMPASS project (grant #12.0.92) 1 / 30

More information

Testing Graph Isomorphism

Testing Graph Isomorphism Testing Graph Isomorphism Eldar Fischer Arie Matsliah Abstract Two graphs G and H on n vertices are ɛ-far from being isomorphic if at least ɛ ( n 2) edges must be added or removed from E(G) in order to

More information

Out-colourings of Digraphs

Out-colourings of Digraphs Out-colourings of Digraphs N. Alon J. Bang-Jensen S. Bessy July 13, 2017 Abstract We study vertex colourings of digraphs so that no out-neighbourhood is monochromatic and call such a colouring an out-colouring.

More information

Fixed Parameter Algorithms for Interval Vertex Deletion and Interval Completion Problems

Fixed Parameter Algorithms for Interval Vertex Deletion and Interval Completion Problems Fixed Parameter Algorithms for Interval Vertex Deletion and Interval Completion Problems Arash Rafiey Department of Informatics, University of Bergen, Norway arash.rafiey@ii.uib.no Abstract We consider

More information

Limitations of Algorithm Power

Limitations of Algorithm Power Limitations of Algorithm Power Objectives We now move into the third and final major theme for this course. 1. Tools for analyzing algorithms. 2. Design strategies for designing algorithms. 3. Identifying

More information

Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems

Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems 1 Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems Mauro Franceschelli, Andrea Gasparri, Alessandro Giua, and Giovanni Ulivi Abstract In this paper the formation stabilization problem

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu

More information

1.3 Vertex Degrees. Vertex Degree for Undirected Graphs: Let G be an undirected. Vertex Degree for Digraphs: Let D be a digraph and y V (D).

1.3 Vertex Degrees. Vertex Degree for Undirected Graphs: Let G be an undirected. Vertex Degree for Digraphs: Let D be a digraph and y V (D). 1.3. VERTEX DEGREES 11 1.3 Vertex Degrees Vertex Degree for Undirected Graphs: Let G be an undirected graph and x V (G). The degree d G (x) of x in G: the number of edges incident with x, each loop counting

More information

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback 9/3/3 Admin Assignment 3: - how did it go? - do the experiments help? Assignment 4 IMBALANCED DATA Course feedback David Kauchak CS 45 Fall 3 Phishing 9/3/3 Setup Imbalanced data. for hour, google collects

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Privacy-preserving Data Mining

Privacy-preserving Data Mining Privacy-preserving Data Mining What is [data] privacy? Privacy and Data Mining Privacy-preserving Data mining: main approaches Anonymization Obfuscation Cryptographic hiding Challenges Definition of privacy

More information

Graphs with large maximum degree containing no odd cycles of a given length

Graphs with large maximum degree containing no odd cycles of a given length Graphs with large maximum degree containing no odd cycles of a given length Paul Balister Béla Bollobás Oliver Riordan Richard H. Schelp October 7, 2002 Abstract Let us write f(n, ; C 2k+1 ) for the maximal

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Partial cubes: structures, characterizations, and constructions

Partial cubes: structures, characterizations, and constructions Partial cubes: structures, characterizations, and constructions Sergei Ovchinnikov San Francisco State University, Mathematics Department, 1600 Holloway Ave., San Francisco, CA 94132 Abstract Partial cubes

More information

Resilient Asymptotic Consensus in Robust Networks

Resilient Asymptotic Consensus in Robust Networks Resilient Asymptotic Consensus in Robust Networks Heath J. LeBlanc, Member, IEEE, Haotian Zhang, Student Member, IEEE, Xenofon Koutsoukos, Senior Member, IEEE, Shreyas Sundaram, Member, IEEE Abstract This

More information

Influence Maximization in Dynamic Social Networks

Influence Maximization in Dynamic Social Networks Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang and Xiaoming Sun Department of Computer Science and Technology, Tsinghua University Department of Computer

More information

Personalized Social Recommendations Accurate or Private

Personalized Social Recommendations Accurate or Private Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma Outline Introduction Motivation The model General

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

Database Privacy: k-anonymity and de-anonymization attacks

Database Privacy: k-anonymity and de-anonymization attacks 18734: Foundations of Privacy Database Privacy: k-anonymity and de-anonymization attacks Piotr Mardziel or Anupam Datta CMU Fall 2018 Publicly Released Large Datasets } Useful for improving recommendation

More information

The Lovász Local Lemma : A constructive proof

The Lovász Local Lemma : A constructive proof The Lovász Local Lemma : A constructive proof Andrew Li 19 May 2016 Abstract The Lovász Local Lemma is a tool used to non-constructively prove existence of combinatorial objects meeting a certain conditions.

More information

Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments

Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments François Le Gall The University of Tokyo Technical version available at arxiv:1407.0085 [quant-ph]. Background. Triangle finding

More information

Parameterized Algorithms for the H-Packing with t-overlap Problem

Parameterized Algorithms for the H-Packing with t-overlap Problem Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 18, no. 4, pp. 515 538 (2014) DOI: 10.7155/jgaa.00335 Parameterized Algorithms for the H-Packing with t-overlap Problem Alejandro López-Ortiz

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

EVOLUTION is manifested to be a common property of. Interest-aware Information Diffusion in Evolving Social Networks

EVOLUTION is manifested to be a common property of. Interest-aware Information Diffusion in Evolving Social Networks Interest-aware Information Diffusion in Evolving Social Networks Jiaqi Liu, Luoyi Fu, Zhe Liu, Xiao-Yang Liu and Xinbing Wang Abstract Many realistic wireless social networks are manifested to be evolving

More information

1.1 P, NP, and NP-complete

1.1 P, NP, and NP-complete CSC5160: Combinatorial Optimization and Approximation Algorithms Topic: Introduction to NP-complete Problems Date: 11/01/2008 Lecturer: Lap Chi Lau Scribe: Jerry Jilin Le This lecture gives a general introduction

More information

Maximising the number of induced cycles in a graph

Maximising the number of induced cycles in a graph Maximising the number of induced cycles in a graph Natasha Morrison Alex Scott April 12, 2017 Abstract We determine the maximum number of induced cycles that can be contained in a graph on n n 0 vertices,

More information

Optimal Fractal Coding is NP-Hard 1

Optimal Fractal Coding is NP-Hard 1 Optimal Fractal Coding is NP-Hard 1 (Extended Abstract) Matthias Ruhl, Hannes Hartenstein Institut für Informatik, Universität Freiburg Am Flughafen 17, 79110 Freiburg, Germany ruhl,hartenst@informatik.uni-freiburg.de

More information

Lecture 14 - P v.s. NP 1

Lecture 14 - P v.s. NP 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 27, 2018 Lecture 14 - P v.s. NP 1 In this lecture we start Unit 3 on NP-hardness and approximation

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2017 Introduction to Artificial Intelligence Midterm V2 You have approximately 80 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark

More information

Internal link prediction: a new approach for predicting links in bipartite graphs

Internal link prediction: a new approach for predicting links in bipartite graphs Internal link prediction: a new approach for predicting links in bipartite graphs Oussama llali, lémence Magnien and Matthieu Latapy LIP6 NRS and Université Pierre et Marie urie (UPM Paris 6) 4 place Jussieu

More information

Public Key Exchange by Neural Networks

Public Key Exchange by Neural Networks Public Key Exchange by Neural Networks Zahir Tezcan Computer Engineering, Bilkent University, 06532 Ankara zahir@cs.bilkent.edu.tr Abstract. This work is a survey on the concept of neural cryptography,

More information

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks Osman Yağan Department of ECE Carnegie Mellon University Joint work with Y. Zhuang and V. Gligor (CMU) Alex

More information

Enumerating minimal connected dominating sets in graphs of bounded chordality,

Enumerating minimal connected dominating sets in graphs of bounded chordality, Enumerating minimal connected dominating sets in graphs of bounded chordality, Petr A. Golovach a,, Pinar Heggernes a, Dieter Kratsch b a Department of Informatics, University of Bergen, N-5020 Bergen,

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Limiting Link Disclosure in Social Network Analysis through Subgraph-Wise Perturbation

Limiting Link Disclosure in Social Network Analysis through Subgraph-Wise Perturbation Limiting Link Disclosure in Social Network Analysis through Subgraph-Wise Perturbation Amin Milani Fard School of Computing Science Simon Fraser University Burnaby, BC, Canada milanifard@cs.sfu.ca Ke Wang

More information

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions Summary of the previous lecture Recall that we mentioned the following topics: P: is the set of decision problems (or languages) that are solvable in polynomial time. NP: is the set of decision problems

More information

Permutations and Combinations

Permutations and Combinations Permutations and Combinations Permutations Definition: Let S be a set with n elements A permutation of S is an ordered list (arrangement) of its elements For r = 1,..., n an r-permutation of S is an ordered

More information

Wireless Network Security Spring 2016

Wireless Network Security Spring 2016 Wireless Network Security Spring 2016 Patrick Tague Class #19 Vehicular Network Security & Privacy 2016 Patrick Tague 1 Class #19 Review of some vehicular network stuff How wireless attacks affect vehicle

More information

Three-Way Analysis of Facial Similarity Judgments

Three-Way Analysis of Facial Similarity Judgments Three-Way Analysis of Facial Similarity Judgments Daryl H. Hepting, Hadeel Hatim Bin Amer, and Yiyu Yao University of Regina, Regina, SK, S4S 0A2, CANADA hepting@cs.uregina.ca, binamerh@cs.uregina.ca,

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Some Complexity Problems on Single Input Double Output Controllers

Some Complexity Problems on Single Input Double Output Controllers Some Complexity Problems on Single Input Double Output Controllers K. M. Hangos 1 Zs. Tuza 1,2, A. Yeo 3 1 Computer and Automation Institute, Hungarian Academy of Sciences, H-1111 Budapest, Kende u. 13

More information

Featured Articles Advanced Research into AI Ising Computer

Featured Articles Advanced Research into AI Ising Computer 156 Hitachi Review Vol. 65 (2016), No. 6 Featured Articles Advanced Research into AI Ising Computer Masanao Yamaoka, Ph.D. Chihiro Yoshimura Masato Hayashi Takuya Okuyama Hidetaka Aoki Hiroyuki Mizuno,

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan

More information

The Simultaneous Local Metric Dimension of Graph Families

The Simultaneous Local Metric Dimension of Graph Families Article The Simultaneous Local Metric Dimension of Graph Families Gabriel A. Barragán-Ramírez 1, Alejandro Estrada-Moreno 1, Yunior Ramírez-Cruz 2, * and Juan A. Rodríguez-Velázquez 1 1 Departament d Enginyeria

More information