CMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection

Size: px
Start display at page:

Download "CMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection"

Transcription

1 Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Christos Faloutsos CMU

2 Thank you! Prof. Julian McAuley Nicholas Urioste UCSD'16 (c) 2016, C. Faloutsos 2

3 Roadmap Introduction Motivation Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors Conclusions UCSD'16 (c) 2016, C. Faloutsos 3

4 Graphs - why should we care? ~1B nodes (web sites) ~6B edges (http links) YahooWeb graph UCSD'16 (c) 2016, C. Faloutsos 4

5 Graphs - why should we care? >$10B; ~1B users UCSD'16 (c) 2016, C. Faloutsos 5

6 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez 91] UCSD'16 (c) 2016, C. Faloutsos 6

7 Graphs - why should we care? web-log ( blog ) news propagation computer network security: /ip traffic and anomaly detection Recommendation systems... Many-to-many db relationship -> graph UCSD'16 (c) 2016, C. Faloutsos 7

8 Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors destination time UCSD'16 (c) 2016, C. Faloutsos 8

9 Motivating problems P1: patterns? Fraud detection? Patterns anomalies P2: patterns in time-evolving graphs / tensors destination time UCSD'16 (c) 2016, C. Faloutsos 9

10 Motivating problems P1: patterns? Fraud detection? Patterns anomalies* P2: patterns in time-evolving graphs / tensors destination time * Robust Random Cut Forest Based Anomaly Detection on Streams Sudipto Guha, Nina Mishra, Gourav Roy, UCSD'16 (c) 2016, C. Faloutsos 10 Okke Schrijvers, ICML 16

11 Roadmap Introduction Motivation Why study (big) graphs? Part#1: Patterns & fraud detection Part#2: time-evolving graphs; tensors Conclusions UCSD'16 (c) 2016, C. Faloutsos 11

12 Part 1: Patterns, & fraud detection UCSD'16 (c) 2016, C. Faloutsos 12

13 Laws and patterns Q1: Are real graphs random? UCSD'16 (c) 2016, C. Faloutsos 13

14 Laws and patterns Q1: Are real graphs random? A1: NO!! Diameter ( 6 degrees ; Kevin Bacon ) in- and out- degree distributions other (surprising) patterns So, let s look at the data UCSD'16 (c) 2016, C. Faloutsos 14

15 Solution# S.1 Power law in the degree distribution [Faloutsos x 3 SIGCOMM99] log(degree) ibm.com internet domains att.com log(rank) UCSD'16 (c) 2016, C. Faloutsos 15

16 Solution# S.1 Power law in the degree distribution [Faloutsos x 3 SIGCOMM99] log(degree) ibm.com internet domains att.com log(rank) UCSD'16 (c) 2016, C. Faloutsos 16

17 S2: connected component sizes Connected Components 4 observations: Count 1.4B nodes 6B edges UCSD'16 Size (c) 2016, C. Faloutsos 17

18 S2: connected component sizes Connected Components Count 1) 10K x larger than next UCSD'16 Size (c) 2016, C. Faloutsos 18

19 S2: connected component sizes Connected Components Count 2) ~0.7B singleton nodes UCSD'16 Size (c) 2016, C. Faloutsos 19

20 S2: connected component sizes Connected Components Count 3) SLOPE! UCSD'16 Size (c) 2016, C. Faloutsos 20

21 S2: connected component sizes Connected Components Count 300-size cmpt X size cmpt Why? X 65. Why? 4) Spikes! UCSD'16 Size (c) 2016, C. Faloutsos 21

22 S2: connected component sizes Connected Components Count suspicious financial-advice sites (not existing now) UCSD'16 Size (c) 2016, C. Faloutsos 22

23 Roadmap Introduction Motivation Part#1: Patterns in graphs Patterns: Degree; Triangles Anomaly/fraud detection Graph understanding Part#2: time-evolving graphs; tensors Conclusions UCSD'16 (c) 2016, C. Faloutsos 23

24 Solution# S.3: Triangle Laws Real social networks have a lot of triangles UCSD'16 (c) 2016, C. Faloutsos 24

25 Solution# S.3: Triangle Laws Real social networks have a lot of triangles Friends of friends are friends Any patterns? 2x the friends, 2x the triangles? UCSD'16 (c) 2016, C. Faloutsos 25

26 Triangle Law: #S.3 [Tsourakakis ICDM 2008] Reuters SN Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n 1.6 triangles UCSD'16 (c) 2016, C. Faloutsos 26

27 Triangle counting for large graphs???? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD 11] UCSD'16 (c) 2016, C. Faloutsos 27

28 Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD 11] UCSD'16 (c) 2016, C. Faloutsos 28

29 Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD 11] UCSD'16 (c) 2016, C. Faloutsos 29

30 Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD 11] UCSD'16 (c) 2016, C. Faloutsos 30

31 Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD 11] UCSD'16 (c) 2016, C. Faloutsos 31

32 MORE Graph Patterns RTG: A Recursive Realistic Graph Generator using Random UCSD'16 (c) 2016, C. Faloutsos 32 Typing Leman Akoglu and Christos Faloutsos. PKDD 09.

33 MORE Graph Patterns Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics (Ed.: Charu Aggarwal) Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool. UCSD'16 (c) 2016, C. Faloutsos 33

34 Roadmap Introduction Motivation Part#1: Patterns in graphs Patterns Anomaly / fraud detection No labels spectral Patterns With labels: Belief Propagation Part#2: time-evolving graphs; tensors Conclusions anomalies UCSD'16 (c) 2016, C. Faloutsos 34

35 How to find suspicious groups? blocks are normal, right? idols fans UCSD'16 (c) 2016, C. Faloutsos 35

36 Except that: blocks are normal, right? hyperbolic communities are more realistic [Araujo+, PKDD 14] UCSD'16 (c) 2016, C. Faloutsos 36

37 Except that: blocks are usually suspicious hyperbolic communities are more realistic [Araujo+, PKDD 14] Q: Can we spot blocks, easily? UCSD'16 (c) 2016, C. Faloutsos 37

38 Except that: blocks are usually suspicious hyperbolic communities are more realistic [Araujo+, PKDD 14] Q: Can we spot blocks, easily? A: Silver bullet: SVD! UCSD'16 (c) 2016, C. Faloutsos 38

39 Crush intro to SVD Recall: (SVD) matrix factorization: finds blocks M idols music lovers singers sports lovers athletes citizens politicians N fans ~ + + UCSD'16 (c) 2016, C. Faloutsos 39

40 Crush intro to SVD Recall: (SVD) matrix factorization: finds blocks M idols music lovers singers sports lovers athletes citizens politicians N fans ~ + + UCSD'16 (c) 2016, C. Faloutsos 40

41 Inferring Strange Behavior from Connectivity Pattern in Social Networks Meng Jiang, Peng Cui, Shiqiang Yang (Tsinghua, Beijing) Alex Beutel, Christos Faloutsos (CMU)

42 Lockstep and Spectral Subspace Plot Case #0: No lockstep behavior in random power law graph of 1M nodes, 3M edges Random Scatter Adjacency Matrix Spectral Subspace Plot UCSD'16 (c) 2016, C. Faloutsos 42

43 Lockstep and Spectral Subspace Plot Case #1: non-overlapping lockstep Blocks Rays Adjacency Matrix Spectral Subspace Plot UCSD'16 (c) 2016, C. Faloutsos 43

44 Lockstep and Spectral Subspace Plot Case #2: non-overlapping lockstep Blocks; low density Elongation Adjacency Matrix Spectral Subspace Plot UCSD'16 (c) 2016, C. Faloutsos 44

45 Lockstep and Spectral Subspace Plot Case #3: non-overlapping lockstep Camouflage (or Fame ) Rays Adjacency Matrix Tilting Spectral Subspace Plot UCSD'16 (c) 2016, C. Faloutsos 45

46 Lockstep and Spectral Subspace Plot Case #3: non-overlapping lockstep Camouflage (or Fame ) Rays Adjacency Matrix Tilting Spectral Subspace Plot UCSD'16 (c) 2016, C. Faloutsos 46

47 Lockstep and Spectral Subspace Plot Case #4:? lockstep? Pearls Adjacency Matrix Spectral Subspace Plot? UCSD'16 (c) 2016, C. Faloutsos 47

48 Tencent Weibo Dataset 117 million nodes (with profile and UGC data) 3.33 billion directed edges UCSD'16 (c) 2016, C. Faloutsos 49

49 Rays Real Data Block Pearls Staircase UCSD'16 (c) 2016, C. Faloutsos 50

50 Real Data Spikes on the out-degree distribution UCSD'16 (c) 2016, C. Faloutsos 51

51 Roadmap Introduction Motivation Part#1: Patterns in graphs Patterns Anomaly / fraud detection No labels spectral methods Suspiciousness With labels: Belief Propagation Part#2: time-evolving graphs; tensors Conclusions UCSD'16 (c) 2016, C. Faloutsos 52

52 Suspicious Patterns in Event Data 2-modes n-modes??? A General Suspiciousness Metric for Dense Blocks in Multimodal Data, Meng Jiang, Alex Beutel, Peng Cui, Bryan UCSD'16 (c) 2016, C. Faloutsos 53 Hooi, Shiqiang Yang, and Christos Faloutsos, ICDM, 2015.

53 ICDM 2015 Suspicious Patterns in Event Data 20,000 Users Retweeting same 20 tweets 6 times each All in 10 hours Which is more suspicious? vs. 225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses Answer: volume * D KL (p p background ) UCSD'16 (c) 2016, C. Faloutsos 54

54 rossspot CMU SCS ICDM 2015 HOSVD Suspicious Patterns in Event Data ) 6 0 blocks VD seed in finding HOSVD. ever, the cks such though omposiata: We number 2=1, We data, so, d mass 225 Users 200 Minutes (a) Robustness 27,313 Retweets 10:09 10:14 10:19 10:24 10:32 10:37 11:04 15:37 21:13 John123 l : : l l l Joe456 l :(b) l Convergence : : : : Job789 l : l : l l l Jack135 : l : : : l : l Jill246 : l l l : Jane314 : : l l l l l Jay357 l l l l : l Fig. 4. CROSSSPOT is robust to the number of random seeds. In detecting the 4 low-order blocks, when we use 41 seeds, the best F1 score has reported the final result of as many as 1,000 seeds. CROSSSPOT converges very fast: the average number. of iterations is Fig. 1. Density multiple modes is suspicious. Left: A dense block of 225 Retweeting: users on Tencent Weibo Galaxy (Chinese Twitter) Note retweeting Dream one tweet Project: 27,313 times TABLE from 2 IPVI. addresses BIG over DENSE 200 minutes. BLOCKS Right: WITH magnification TOP METRIC of a subset VALUES of this block. Happy l and DISCOVERED : Happy indicate the INLife two THEIPRETWEETING Traveling addresses used. DATASET. Notice the how World synchronized the behavior is, across several modes (IP-address, user-id, timestamp). # User tweet IP minute Mass c Suspiciousness Informal Problem ,114 1 (Suspiciousness score) 41,396Given 1,239,865 a K- mode dataset 2 (tensor) X, with counts 27,313 of events (that 777,781 are non-negative 3 integer ,872 values), and two subtensors 17,701 Y 1 and 491,323 Y 2, which is more 1 suspicious and worthy of further 3,582 investigation? 131,113 CROSSSPOT HOSVD ,942 74, , ,211 Why multimodal data (tensor): Graphs and social networks have attracted huge interest - and they are perfectly modeled as K=2 mode datasets, that is, matrices. With K=2 modes we can model Twitter s who-follows-whom network [1][2], Face- UCSD'16 (c) 2016, C. Faloutsos 55 Figure 4(a) shows the best F1 score for different numbers of random seeds. We find that when we use 41 random seeds, the...

55 Roadmap Introduction Motivation Part#1: Patterns in graphs Patterns Anomaly / fraud detection No labels spectral methods With labels: Belief Propagation Part#2: time-evolving graphs; tensors Conclusions UCSD'16 (c) 2016, C. Faloutsos 56

56 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www 07] UCSD'16 (c) 2016, C. Faloutsos 57

57 E-bay Fraud detection UCSD'16 (c) 2016, C. Faloutsos 58

58 E-bay Fraud detection UCSD'16 (c) 2016, C. Faloutsos 59

59 E-bay Fraud detection - NetProbe UCSD'16 (c) 2016, C. Faloutsos 60

60 Popular press And less desirable attention: from Belgium police ( copy of your code? ) UCSD'16 (c) 2016, C. Faloutsos 61

61 Roadmap Introduction Motivation Part#1: Patterns in graphs Patterns Anomaly / fraud detection No labels - Spectral methods w/ labels: Belief Propagation closed formulas Part#2: time-evolving graphs; tensors Conclusions UCSD'16 (c) 2016, C. Faloutsos 62

62 Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra* U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos (*KDD dissertation award, 2016) ECML PKDD, 5-9 September 2011, Athens, Greece

63 Problem Definition: GBA techniques Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) UCSD'16 (c) 2016, C. Faloutsos 64

64 Are they related? RWR (Random Walk with Restarts) google s pagerank ( if my friends are important, I m important, too ) SSL (Semi-supervised learning) minimize the differences among neighbors BP (Belief propagation) send messages to neighbors, on what you believe about them UCSD'16 (c) 2016, C. Faloutsos 65

65 Are they related? YES! RWR (Random Walk with Restarts) google s pagerank ( if my friends are important, I m important, too ) SSL (Semi-supervised learning) minimize the differences among neighbors BP (Belief propagation) send messages to neighbors, on what you believe about them UCSD'16 (c) 2016, C. Faloutsos 66

66 Correspondence of Methods Method Matrix Unknown known RWR [I c AD -1 ] x = (1-c)y SSL [I + a(d - A)] x = y FABP [I + a D - c A] b h = φ h d1 d2 d adjacency matrix? final labels/ beliefs 0 1 prior labels/ beliefs UCSD'16 (c) 2016, C. Faloutsos 67

67 Problem: e-commerce ratings fraud Given a heterogeneous graph on users, products, sellers and positive/negative ratings with seed labels Find the top k most fraudulent users, products and sellers Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, ZooBP: Belief Propagation for Heterogeneous Networks, In UCSD'16 (c) 2016, C. Faloutsos 68 submission to VLDB 2017

68 ZooBP: Salient features Fast approximate heterogeneous belief propagation with precise convergence guarantees! Near-perfect accuracy Scalable: linear in graph size Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, ZooBP: Belief Propagation for Heterogeneous Networks, In UCSD'16 (c) 2016, C. Faloutsos 69 submission to VLDB 2017

69 Summary of Part#1 *many* patterns in real graphs Power-laws everywhere Long (and growing) list of tools for anomaly/ fraud detection Patterns anomalies UCSD'16 (c) 2016, C. Faloutsos 70

70 Roadmap Introduction Motivation Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors P2.1: time-evolving graphs P2.2: inter-arrival time patterns Conclusions UCSD'16 (c) 2016, C. Faloutsos 71

71 Part 2: Time evolving graphs; tensors UCSD'16 (c) 2016, C. Faloutsos 72

72 Graphs over time -> tensors! Problem #2.1: Given who calls whom, and when Find patterns / anomalies smith UCSD'16 (c) 2016, C. Faloutsos 73

73 Graphs over time -> tensors! Problem #2.1: Given who calls whom, and when Find patterns / anomalies UCSD'16 (c) 2016, C. Faloutsos 74

74 Graphs over time -> tensors! Problem #2.1: Given who calls whom, and when Find patterns / anomalies Tue Mon UCSD'16 (c) 2016, C. Faloutsos 75

75 Graphs over time -> tensors! Problem #2.1: Given who calls whom, and when Find patterns / anomalies caller callee UCSD'16 (c) 2016, C. Faloutsos 76

76 Graphs over time -> tensors! Problem #2.1 : Given author-keyword-date Find patterns / anomalies MANY more settings, author with >2 modes keyword UCSD'16 (c) 2016, C. Faloutsos 77

77 Graphs over time -> tensors! Problem #2.1 : Given subject verb object facts Find patterns / anomalies MANY more settings, subject with >2 modes object UCSD'16 (c) 2016, C. Faloutsos 78

78 Graphs over time -> tensors! Problem #2.1 : Given <triplets> Find patterns / anomalies MANY more settings, mode1 with >2 modes (and 4, 5, etc modes) mode2 UCSD'16 (c) 2016, C. Faloutsos 79

79 Roadmap Introduction Motivation Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors P2.1: time-evolving graphs P2.2: inter-arrival time patterns Conclusions UCSD'16 (c) 2016, C. Faloutsos 82

80 Crush intro to SVD Recall: (SVD) matrix factorization: finds blocks M idols music lovers singers sports lovers athletes citizens politicians N fans ~ + + UCSD'16 (c) 2016, C. Faloutsos 83

81 Answer to both: tensor factorization PARAFAC decomposition politicians artists athletes subject = + + object UCSD'16 (c) 2016, C. Faloutsos 84

82 Answer: tensor factorization PARAFAC decomposition Results for who-calls-whom-when 4M x 15 days?????? caller = + + callee UCSD'16 (c) 2016, C. Faloutsos 85

83 Anomaly detection in timeevolving graphs = Anomalous communities in phone call data: European country, 4M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! UCSD'16 (c) 2016, C. Faloutsos 86

84 Anomaly detection in timeevolving graphs = Anomalous communities in phone call data: European country, 4M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! UCSD'16 (c) 2016, C. Faloutsos 87

85 Anomaly detection in timeevolving graphs = Anomalous communities in phone call data: European country, 4M clients, data over 2 weeks Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast ~200 calls to EACH receiver on EACH day! Automatic Discovery of Temporal (Comet) Communities. UCSD'16 (c) 2016, C. Faloutsos 88 PAKDD 2014, Tainan, Taiwan.

86 Roadmap Introduction Motivation Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; P2.1: tensors P2.2: Inter-arrival time patterns Acknowledgements and Conclusions UCSD'16 (c) 2016, C. Faloutsos 89

87 Universidade de São Paulo KDD 2015 Sydney, Australia RSC: Mining and Modeling Temporal Activity in Social Media Alceu F. Costa * Yuto Yamaguchi Agma J. M. Traina Caetano Traina Jr. Christos Faloutsos * alceufc@icmc.usp.br

88 Pattern Mining: Datasets Reddit Dataset Time-stamp from comments 21,198 users 20 Million time-stamps Twitter Dataset Time-stamp from tweets 6,790 users 16 Million time-stamps For each user we have: Sequence of postings time-stamps: T = (t 1, t 2, t 3, ) Inter-arrival times (IAT) of postings: ( 1, 2, 3, ) UCSD'16 t 1 t 2 t 3 t 4 (c) 2016, C. Faloutsos time 91

89 Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be inactive for long periods of time before making new postings IAT Complementary Cumulative Distribution Function (CCDF) (log-log axis) CCDF, P(x ) UCSD' day 10 days CCDF, P(x ) , IAT (seconds) (c) 2016, C. Faloutsos Reddit Users day10 days , IAT (seconds) Twitter Users 92

90 Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be inactive for long periods of time before making new postings No surprises IAT Complementary Should Cumulative we Distribution give up? Function (CCDF) (log-log axis) CCDF, P(x ) UCSD' day 10 days CCDF, P(x ) , IAT (seconds) (c) 2016, C. Faloutsos Reddit Users day10 days , IAT (seconds) Twitter Users 93

91 Human? Robots? linear log UCSD'16 (c) 2016, C. Faloutsos 95

92 linear Human? Robots? 2 3h 1day log UCSD'16 (c) 2016, C. Faloutsos 96

93 Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top Precision Sensitivity (Recall) UCSD'16 RSC Spotter IAT Hist. Twitter Entropy [6] Weekday Hist. (c) 2016, C. Faloutsos Precision > 94% Sensitivity > 70% With strongly imbalanced datasets # humans >> # bots All Features 97

94 Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top Precision Reddit Sensitivity (Recall) Precision > 96% Sensitivity > 47% With strongly imbalanced datasets # humans >> # bots UCSD'16 RSC Spotter IAT Hist. Entropy [6] Weekday Hist. (c) 2016, C. Faloutsos All Features 98

95 Part 2: Conclusions Time-evolving / heterogeneous graphs -> tensors PARAFAC finds patterns (GigaTensor/HaTen2 -> fast & scalable) = UCSD'16 (c) 2016, C. Faloutsos 99

96 Roadmap Introduction Motivation Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors Acknowledgements and Conclusions UCSD'16 (c) 2016, C. Faloutsos 100

97 Thanks Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies Thanks to: NSF IIS , IIS , CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, UCSD'16 (c) 2016, C. Faloutsos 101 Google, INTEL, HP, ilab

98 Cast Akoglu, Leman Araujo, Miguel Beutel, Alex Chau, Polo Eswaran, Dhivya Hooi, Bryan Kang, U Koutra, Danai Papalexakis, Vagelis Shah, Neil Shin, Kijung Song, Hyun Ah UCSD'16 (c) 2016, C. Faloutsos 102

99 CONCLUSION#1 Big data Patterns Anomalies Large datasets reveal patterns/outliers that are invisible otherwise UCSD'16 (c) 2016, C. Faloutsos 103

100 CONCLUSION#2 tensors powerful tool = 1 caller 5 receivers 4 days of activity UCSD'16 (c) 2016, C. Faloutsos 104

101 References D. Chakrabarti, C. Faloutsos: Graph Mining Laws, Tools and Case Studies, Morgan Claypool S00449ED1V01Y201209DMK006 UCSD'16 (c) 2016, C. Faloutsos 105

102 TAKE HOME MESSAGE: Cross-disciplinarity = UCSD'16 (c) 2016, C. Faloutsos 106

103 Thank you! Cross-disciplinarity = UCSD'16 (c) 2016, C. Faloutsos 107

CMU SCS. Large Graph Mining. Christos Faloutsos CMU

CMU SCS. Large Graph Mining. Christos Faloutsos CMU Large Graph Mining Christos Faloutsos CMU Thank you! Hillol Kargupta NGDM 2007 C. Faloutsos 2 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; fraud

More information

Node similarity and classification

Node similarity and classification Node similarity and classification Davide Mottin, Anton Tsitsulin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Some part of this lecture is taken from: http://web.eecs.umich.edu/~dkoutra/tut/icdm14.html

More information

Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities

Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Miguel Araujo 1,5, Spiros Papadimitriou 2, Stephan Günnemann 1, Christos Faloutsos 1, Prithwish Basu 3, Ananthram Swami 4, Evangelos E.

More information

CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools

CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU (on sabbatical at google) Tamara Kolda Thank you! C. Faloutsos (CMU + Google) 2 Our goal: Open source system for mining

More information

Data mining in large graphs

Data mining in large graphs Data mining in large graphs Christos Faloutsos University www.cs.cmu.edu/~christos ALLADIN 2003 C. Faloutsos 1 Outline Introduction - motivation Patterns & Power laws Scalability & Fast algorithms Fractals,

More information

Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities

Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Miguel Araujo CMU/University of Porto maraujo@cs.cmu.edu Christos Faloutsos* Carnegie Mellon University christos@cs.cmu.edu Spiros Papadimitriou

More information

Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities

Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Miguel Araujo 1,5, Spiros Papadimitriou 2, Stephan Günnemann 1, Christos Faloutsos 1, Prithwish Basu 3, Ananthram Swami 4, Evangelos E.

More information

arxiv: v1 [cs.ai] 19 Nov 2015

arxiv: v1 [cs.ai] 19 Nov 2015 BIRDNEST: Bayesian Inference for Ratings-Fraud Detection Bryan Hooi Neil Shah Alex Beutel Stephan Günneman Leman Akoglu Mohit Kumar Disha Makhija Christos Faloutsos arxiv:1511.06030v1 [cs.ai] 19 Nov 2015

More information

Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to ge

Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to ge Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU Thank you! Sharad Mehrotra UCI 2007 C. Faloutsos 2 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece

More information

THE POWER OF CERTAINTY

THE POWER OF CERTAINTY A DIRICHLET-MULTINOMIAL APPROACH TO BELIEF PROPAGATION Dhivya Eswaran* CMU deswaran@cs.cmu.edu Stephan Guennemann TUM guennemann@in.tum.de Christos Faloutsos CMU christos@cs.cmu.edu PROBLEM MOTIVATION

More information

Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms

Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra 1, Tai-You Ke 2,U.Kang 1, Duen Horng (Polo) Chau 1, Hsing-Kuo Kenneth Pao 2, and Christos Faloutsos 1 1 School of Computer

More information

Must-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos

Must-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos Must-read Material 15-826: Multimedia Databases and Data Mining Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. Technical Report SAND2007-6702, Sandia National Laboratories,

More information

CMU SCS Talk 3: Graph Mining Tools Tensors, communities, parallelism

CMU SCS Talk 3: Graph Mining Tools Tensors, communities, parallelism Talk 3: Graph Mining Tools Tensors, communities, parallelism Christos Faloutsos CMU Overall Outline Introduction Motivation Talk#1: Patterns in graphs; generators Talk#2: Tools (Ranking, proximity) Talk#3:

More information

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu Mary McGlohon Christos Faloutsos Carnegie Mellon University, School of Computer Science {lakoglu, mmcgloho, christos}@cs.cmu.edu

More information

Graph-Based Fraud Detection in the Face of Camouflage

Graph-Based Fraud Detection in the Face of Camouflage Graph-Based Fraud Detection in the Face of Camouflage BRYAN HOOI, KIJUNG SHIN, HYUN AH SONG, ALEX BEUTEL, NEIL SHAH, and CHRISTOS FALOUTSOS, Carnegie Mellon University Given a bipartite graph of users

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

A Fast, Accurate and Flexible Algorithms for Dense Subtensor Mining

A Fast, Accurate and Flexible Algorithms for Dense Subtensor Mining A Fast, Accurate and Flexible Algorithms for Dense Subtensor Mining KIJUNG SHIN, BRYAN HOOI and CHRISTOS FALOUTSOS, Carnegie Mellon University Given a large-scale and high-order tensor, how can we detect

More information

Anomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016

Anomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016 Anomaly Detection Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 and Next week February 7, third round presentations Slides are due by February 6,

More information

Fraud Detection using Graph Topology and Temporal Spikes

Fraud Detection using Graph Topology and Temporal Spikes Fraud Detection using Graph Topology and Temporal Spikes Shenghua Liu, 1,2 Bryan Hooi, 2 Christos Faloutsos 2 1 CAS Key Laboratory of Network Data Science & Technology, Institute of Computing Technology,

More information

Mining Triadic Closure Patterns in Social Networks

Mining Triadic Closure Patterns in Social Networks Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University

More information

Discovery of comet communities in temporal and labeled graphs (Com 2 )

Discovery of comet communities in temporal and labeled graphs (Com 2 ) Under consideration for publication in Knowledge and Information Systems Discovery of comet communities in temporal and labeled graphs (Com 2 ) Miguel Araujo 1,2, Stephan Günnemann 1, Spiros Papadimitriou

More information

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,

More information

MalSpot: Multi 2 Malicious Network Behavior Patterns Analysis

MalSpot: Multi 2 Malicious Network Behavior Patterns Analysis MalSpot: Multi 2 Malicious Network Behavior Patterns Analysis Ching-Hao Mao 1, Chung-Jung Wu 1, Evangelos E. Papalexakis 2, Christos Faloutsos 2, and Kuo-Chen Lee 1 1 Institute for Information Industry,

More information

TENSORSPLAT: Spotting Latent Anomalies in Time

TENSORSPLAT: Spotting Latent Anomalies in Time TENSORSPLAT: Spotting Latent Anomalies in Time Danai Koutra, Evangelos E. Papalexakis, Christos Faloutsos School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA Email: {danai, epapalex,

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

GIVEN a tensor that is too large to fit in memory, how can

GIVEN a tensor that is too large to fit in memory, how can Out-of-Core and Distributed Algorithms for Dense Subtensor Mining Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos arxiv:802.0065v2 [cs.db] 6 Feb 208 Abstract How can we detect fraudulent lockstep

More information

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW 15 Florence, Italy Temporal Multi-View Inconsistency Detection for Network Traffic Analysis Houping Xiao 1, Jing Gao 1, Deepak Turaga 2, Long Vu 2, and Alain Biem 2 1 Department of Computer Science

More information

Interact with Strangers

Interact with Strangers Interact with Strangers RATE: Recommendation-aware Trust Evaluation in Online Social Networks Wenjun Jiang 1, 2, Jie Wu 2, and Guojun Wang 1 1. School of Information Science and Engineering, Central South

More information

Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data

Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data SDM 15 Vancouver, CAN Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data Houping Xiao 1, Yaliang Li 1, Jing Gao 1, Fei Wang 2, Liang Ge 3, Wei Fan 4, Long

More information

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Submitted for Blind Review Abstract How do real, weighted graphs change over time? What patterns, if any, do they obey? Earlier studies

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

Outline : Multimedia Databases and Data Mining. Indexing - similarity search. Indexing - similarity search. C.

Outline : Multimedia Databases and Data Mining. Indexing - similarity search. Indexing - similarity search. C. Outline 15-826: Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos Goal: Find similar / interesting things Intro to DB Indexing - similarity search Points Text Time sequences; images

More information

Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach

Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach Anna

More information

HEigen: Spectral Analysis for Billion-Scale Graphs

HEigen: Spectral Analysis for Billion-Scale Graphs IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. XX, MONTH 2012 1 HEigen: Spectral Analysis for Billion-Scale Graphs U Kang, Brendan Meeder, Evangelos E. Papalexakis, and Christos Faloutsos

More information

Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules

Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules Elaine P. M. de Sousa, Marcela X. Ribeiro, Agma J. M. Traina, and Caetano Traina Jr. Department of Computer Science

More information

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor

More information

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,

More information

Patterns amongst Competing Task Frequencies: Super-Linearities, and the Almond-DG model

Patterns amongst Competing Task Frequencies: Super-Linearities, and the Almond-DG model Patterns amongst Competing Task Frequencies: Super-Linearities, and the Almond-DG model Danai Koutra, Vasileios Koutras, B. Aditya Prakash, and Christos Faloutsos 1 Computer Science Dept., Carnegie Mellon

More information

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Link Analysis Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Yahoo! Labs Nov. 1 st, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Motivation Modeling Social Streams Future work Motivation Modeling Social Streams

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Problem (INFORMAL). Given a dynamic graph, find a set of possibly overlapping temporal subgraphs to concisely describe the given dynamic graph in a

Problem (INFORMAL). Given a dynamic graph, find a set of possibly overlapping temporal subgraphs to concisely describe the given dynamic graph in a Outlines TimeCrunch: Interpretable Dynamic Graph Summarization by Neil Shah et. al. (KDD 2015) From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics by Linyun

More information

BEYOND ASSORTATIVITY PROCLIVITY INDEX FOR ATTRIBUTED NETWORKS (PRONE) Reihaneh Rabbany Dhivya Eswaran*

BEYOND ASSORTATIVITY PROCLIVITY INDEX FOR ATTRIBUTED NETWORKS (PRONE) Reihaneh Rabbany Dhivya Eswaran* PROCLIVITY INDEX FOR ATTRIBUTED NETWORKS (PRONE) Reihaneh Rabbany rrabbany@andrew.cmu.edu Artur W. Dubrawski awd@andrew.cmu.edu Dhivya Eswaran* deswaran@cs.cmu.edu Christos Faloutsos christos@cs.cmu.edu

More information

Time Series. CSE 6242 / CX 4242 Mar 27, Duen Horng (Polo) Chau Georgia Tech. Nonlinear Forecasting; Visualization; Applications

Time Series. CSE 6242 / CX 4242 Mar 27, Duen Horng (Polo) Chau Georgia Tech. Nonlinear Forecasting; Visualization; Applications CSE 6242 / CX 4242 Mar 27, 2014 Time Series Nonlinear Forecasting; Visualization; Applications Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon,

More information

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search 6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)

More information

Multimedia analysis and retrieval

Multimedia analysis and retrieval Multimedia analysis and retrieval from the masses, and for the masses Lexing Xie IBM T J Watson Research Center xlx@us.ibm.com ICIP-MIR workshop, Oct 12, 2008 2008 IBM Research multimedia information analysis

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

arxiv: v1 [cs.si] 11 Jun 2017

arxiv: v1 [cs.si] 11 Jun 2017 arxiv:176.3374v1 [cs.si] 11 Jun 217 DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams ABSTRACT Kijung Shin, Bryan Hooi, Jisu Kim, Christos Faloutsos School of Computer Science, Carnegie

More information

From Social User Activities to People Affiliation

From Social User Activities to People Affiliation 2013 IEEE 13th International Conference on Data Mining From Social User Activities to People Affiliation Guangxiang Zeng 1, Ping uo 2, Enhong Chen 1 and Min Wang 3 1 University of Science and Technology

More information

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Rutgers, The State University of New Jersey Nov. 12, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Motivation Modeling Social Streams Future

More information

Modeling Dynamic Evolution of Online Friendship Network

Modeling Dynamic Evolution of Online Friendship Network Commun. Theor. Phys. 58 (2012) 599 603 Vol. 58, No. 4, October 15, 2012 Modeling Dynamic Evolution of Online Friendship Network WU Lian-Ren ( ) 1,2, and YAN Qiang ( Ö) 1 1 School of Economics and Management,

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Cloud-Based Computational Bio-surveillance Framework for Discovering Emergent Patterns From Big Data

Cloud-Based Computational Bio-surveillance Framework for Discovering Emergent Patterns From Big Data Cloud-Based Computational Bio-surveillance Framework for Discovering Emergent Patterns From Big Data Arvind Ramanathan ramanathana@ornl.gov Computational Data Analytics Group, Computer Science & Engineering

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

CMU SCS : Multimedia Databases and Data Mining CMU SCS Must-read Material CMU SCS Must-read Material (cont d)

CMU SCS : Multimedia Databases and Data Mining CMU SCS Must-read Material CMU SCS Must-read Material (cont d) 15-826: Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos Must-read Material Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, On Power-Law Relationships

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

DM-Group Meeting. Subhodip Biswas 10/16/2014

DM-Group Meeting. Subhodip Biswas 10/16/2014 DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions

More information

Two heads better than one: Pattern Discovery in Time-evolving Multi-Aspect Data

Two heads better than one: Pattern Discovery in Time-evolving Multi-Aspect Data Two heads better than one: Pattern Discovery in Time-evolving Multi-Aspect Data Jimeng Sun 1, Charalampos E. Tsourakakis 2, Evan Hoke 4, Christos Faloutsos 2, and Tina Eliassi-Rad 3 1 IBM T.J. Watson Research

More information

Degree (k)

Degree (k) 0 1 Pr(X k) 0 0 1 Degree (k) Figure A1: Log-log plot of the complementary cumulative distribution function (CCDF) of the degree distribution for a sample month (January 0) network is shown (blue), along

More information

Facebook Friends! and Matrix Functions

Facebook Friends! and Matrix Functions Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra

More information

GRIDWATCH: Sensor Placement and Anomaly Detection in the Electrical Grid

GRIDWATCH: Sensor Placement and Anomaly Detection in the Electrical Grid GRIDWATCH: Sensor Placement and Anomaly Detection in the Electrical Grid Bryan Hooi 2, Dhivya Eswaran, Hyun Ah Song, Amritanshu Pandey 3, Marko Jereminov 3, Larry Pileggi 3, and Christos Faloutsos School

More information

8th Grade Standards for Algebra Readiness Solve one-variable linear equations. 3. d 8 = 6 4. = 16 5.

8th Grade Standards for Algebra Readiness Solve one-variable linear equations. 3. d 8 = 6 4. = 16 5. Solve one-variable linear equations. 1. g 7 = 15 2. p 1 3 = 2 3 3. d 8 = 6 4. 2r 3 = 16 5. 3r 4 = 1 8 6. 4x + 7 = 11 7. 13 = h 7 8. 2(b + 5) = 6 9. 5(r 1) = 2(r 4) 6 Solve one- and two-step linear inequalities

More information

Influence Maximization in Dynamic Social Networks

Influence Maximization in Dynamic Social Networks Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang and Xiaoming Sun Department of Computer Science and Technology, Tsinghua University Department of Computer

More information

High Performance Parallel Tucker Decomposition of Sparse Tensors

High Performance Parallel Tucker Decomposition of Sparse Tensors High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon,

More information

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu

More information

Integrated Anchor and Social Link Predictions across Social Networks

Integrated Anchor and Social Link Predictions across Social Networks Proceedings of the TwentyFourth International Joint Conference on Artificial Intelligence IJCAI 2015) Integrated Anchor and Social Link Predictions across Social Networks Jiawei Zhang and Philip S. Yu

More information

Large Graph Mining: Power Tools and a Practitioner s guide

Large Graph Mining: Power Tools and a Practitioner s guide Large Graph Mining: Power Tools and a Practitioner s guide Task 5: Graphs over time & tensors Faloutsos, Miller, Tsourakakis CMU KDD '9 Faloutsos, Miller, Tsourakakis P5-1 Outline Introduction Motivation

More information

Large Graph Mining: Power Tools and a Practitioner s guide

Large Graph Mining: Power Tools and a Practitioner s guide Large Graph Mining: Power Tools and a Practitioner s guide Task 3: Recommendations & proximity Faloutsos, Miller & Tsourakakis CMU KDD'09 Faloutsos, Miller, Tsourakakis P3-1 Outline Introduction Motivation

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

arxiv: v1 [cs.si] 15 Oct 2017

arxiv: v1 [cs.si] 15 Oct 2017 LookOut on Time-Evolving Graphs: Succinctly Explaining Anomalies from Any Detector Nikhil Gupta IIT Delhi cs5140462@cse.iitd.ac.in Dhivya Eswaran, Neil Shah, Leman Akoglu, Christos Faloutsos Carnegie Mellon

More information

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data Jinzhong Wang April 13, 2016 The UBD Group Mobile and Social Computing Laboratory School of Software, Dalian University

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please

More information

Kristina Lerman USC Information Sciences Institute

Kristina Lerman USC Information Sciences Institute Rethinking Network Structure Kristina Lerman USC Information Sciences Institute Università della Svizzera Italiana, December 16, 2011 Measuring network structure Central nodes Community structure Strength

More information

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Outline Astroinformatics Example Application:

More information

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee wwlee1@uiuc.edu September 28, 2004 Motivation IR on newsgroups is challenging due to lack of

More information

Less is More: Non-Redundant Subspace Clustering

Less is More: Non-Redundant Subspace Clustering Less is More: Non-Redundant Subspace Clustering Ira Assent Emmanuel Müller Stephan Günnemann Ralph Krieger Thomas Seidl Aalborg University, Denmark DATA MANAGEMENT AND DATA EXPLORATION GROUP RWTH Prof.

More information

Anomaly Detection. Jing Gao. SUNY Buffalo

Anomaly Detection. Jing Gao. SUNY Buffalo Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their

More information

On the Precision of Social and Information Networks

On the Precision of Social and Information Networks On the Precision of Social and Information Networks Kamesh Munagala (Duke) Reza Bosagh Zadeh (Stanford) Ashish Goel (Stanford) Aneesh Sharma(Twitter, Inc.) Information Networks Social Networks play an

More information

REV2: Fraudulent User Prediction in Rating Platforms

REV2: Fraudulent User Prediction in Rating Platforms REV2: Fraudulent User Prediction in Rating Platforms Srijan Kumar Stanford University, USA srijan@cs.stanford.edu Bryan Hooi Carnegie Mellon University, USA bhooi@cs.cmu.edu Disha Makhija Flipkart, India

More information

CS224W: Social and Information Network Analysis

CS224W: Social and Information Network Analysis CS224W: Social and Information Network Analysis Reaction Paper Adithya Rao, Gautam Kumar Parai, Sandeep Sripada Keywords: Self-similar networks, fractality, scale invariance, modularity, Kronecker graphs.

More information

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged. Web Data Mining PageRank, University of Szeged Why ranking web pages is useful? We are starving for knowledge It earns Google a bunch of money. How? How does the Web looks like? Big strongly connected

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation

More information

StructInf: Mining Structural Influence from Social Streams

StructInf: Mining Structural Influence from Social Streams Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-7) StructInf: Mining Structural Influence from Social Streams Jing Zhang, Jie Tang, Yuanyi Zhong, Yuchen Mo, Juanzi Li,

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/30/17 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

More information

IV Course Spring 14. Graduate Course. May 4th, Big Spatiotemporal Data Analytics & Visualization

IV Course Spring 14. Graduate Course. May 4th, Big Spatiotemporal Data Analytics & Visualization Spatiotemporal Data Visualization IV Course Spring 14 Graduate Course of UCAS May 4th, 2014 Outline What is spatiotemporal data? How to analyze spatiotemporal data? How to visualize spatiotemporal data?

More information

Detecting Origin-Destination Mobility Flows From Geotagged Tweets in Greater Los Angeles Area

Detecting Origin-Destination Mobility Flows From Geotagged Tweets in Greater Los Angeles Area Detecting Origin-Destination Mobility Flows From Geotagged Tweets in Greater Los Angeles Area Song Gao 1, Jiue-An Yang 1,2, Bo Yan 1, Yingjie Hu 1, Krzysztof Janowicz 1, Grant McKenzie 1 1 STKO Lab, Department

More information

Detecting Anomalies in Bipartite Graphs with Mutual Dependency Principles

Detecting Anomalies in Bipartite Graphs with Mutual Dependency Principles Detecting Anomalies in Bipartite Graphs with Mutual Dependency Principles Hanbo Dai Feida Zhu Ee-Peng Lim HweeHwa Pang School of Information Systems, Singapore Management University hanbodai28, fdzhu,

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

A Bivariate Point Process Model with Application to Social Media User Content Generation

A Bivariate Point Process Model with Application to Social Media User Content Generation 1 / 33 A Bivariate Point Process Model with Application to Social Media User Content Generation Emma Jingfei Zhang ezhang@bus.miami.edu Yongtao Guan yguan@bus.miami.edu Department of Management Science

More information

Incremental Pattern Discovery on Streams, Graphs and Tensors

Incremental Pattern Discovery on Streams, Graphs and Tensors Incremental Pattern Discovery on Streams, Graphs and Tensors Jimeng Sun Thesis Committee: Christos Faloutsos Tom Mitchell David Steier, External member Philip S. Yu, External member Hui Zhang Abstract

More information

UnSAID: Uncertainty and Structure in the Access to Intensional Data

UnSAID: Uncertainty and Structure in the Access to Intensional Data UnSAID: Uncertainty and Structure in the Access to Intensional Data Pierre Senellart 3 July 214, Univ. Rennes 1 Uncertain data is everywhere Numerous sources of uncertain data: Measurement errors Data

More information

Using Rank Propagation and Probabilistic Counting for Link-based Spam Detection

Using Rank Propagation and Probabilistic Counting for Link-based Spam Detection Using Rank Propagation and Probabilistic for Link-based Luca Becchetti, Carlos Castillo,Debora Donato, Stefano Leonardi and Ricardo Baeza-Yates 2. Università di Roma La Sapienza Rome, Italy 2. Yahoo! Research

More information

Stat 315c: Introduction

Stat 315c: Introduction Stat 315c: Introduction Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Stat 315c: Introduction 1 / 14 Stat 315c Analysis of Transposable Data Usual Statistics Setup there s Y (we ll

More information

Compressing Tabular Data via Pairwise Dependencies

Compressing Tabular Data via Pairwise Dependencies Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet

More information