Estimating Clustering Coefficients and Size of Social Networks via Random Walk

Size: px
Start display at page:

Download "Estimating Clustering Coefficients and Size of Social Networks via Random Walk"

Transcription

1 Estimating Clustering Coefficients and Size of Social Networks via Random Walk Stephen J. Hardiman* Capital Fund Management France Liran Katzir Advanced Technology Labs Microsoft Research, Israel *Research was conducted while the author was unaffiliated

2 Motivation: Social Networks Qzone Netlog Google+ Bebo Twitter Facebook Classmates.com Sina Weibo Sonico.com Orkut Renren Habbo Flixster MyLife Tagged Friendster hi5 LinkedIn Vkontakte Plaxo

3 Motivation: External access Social Analytics The online social network v 3 v 5 v 7 v 1 v 2 v 9 Privacy Disk Space Communication v 4 v 6 v 8

4 Task: Estimate parameters Global Clustering Coefficient Network Average CC Number of Registered Users Predicting Social Products Potential. Business development/ advertisement/ market size.

5 Global Clustering Coefficient Global CC = 3 x number of triangles number of connected triplet v 3 v 5 v 7 v 1 v 2 v 9 Triangle v 4 v 6 v 8 Connected Triplet

6 Global Clustering Coefficient Exact: [Alon et al, 1997] Estimation input is read at least once: Random Access: [Avron, 2010] Streaming Model: [Buriol et al, 2006] Estimation sampling: Random Access: [Schank et al, 2005] External Access: This work.

7 Local Clustering Coefficient C i = #connections between vi s neighbors d i (d i 1)/2 C 2 =1/3 v 3 v 5 d i degree of node i d 1 = 1 d 2 = 3 d 9 = 2 v 7 v 1 v 2 v 9 Network Average CC = average local CC v 4 v 6 v 8

8 Network Average CC Exact: Naïve. Estimation input is read at least once: Streaming Model: [Becchetti et al, 2010] Estimation sampling: Random Access: [Schank et al, 2005] External Access: [Ribeiro et al 2010], [Gjoka et al, 2010], This work Improved accuracy.

9 Number of Registered Users Exact: trivial Estimation sampling: External Access: [Hardiman et al 2009], [Katzir et al, 2011], This work Improved accuracy.

10 Random Walk Sampled Nodes: v 1 v 2 v 3 v 4 v 5 Stationary Distribution = d i 1 3 v 3 v d i v v 1 v v 4 v 6 v 8 v 9

11 Random Walk - Summary Sampled Nodes Visible Nodes Invisible Nodes Visible Edges Invisible Edges v 3 v 5 v 7 v 1 v 2 v 9 v 4 v 6 v 8

12 Global CC Algorithm The estimated global clustering coefficient: c g = Φ g Ψ g 1. Ψ g Sampled nodes average degree if there is an edge v k 1 v k+1, φ k = 1 iff v k 1, v k, v k+1 is a triangle 0 Otherwise. 2. Φ g Sampled nodes average φ k d k.

13 Global CC Example Φ g = = 2 3 Ψ g = = 7 5 φ 2 = 0 φ 3 = 1 v 1 v 2 v 3 v 5 v 7 c g = c g = φ 4 = 0 v 4 v 6

14 E φ k d k = Expectation of φ k = = n i=1 n i=1 n i=1 d i D E φ kd k x k = v i d i D 2l i D d i The degree of node v i. 2l i d i d i d i l i The number of triangles contain v i. n The number of nodes. Total expectation d i d i combinations. 2l i yield φ k =1 D = n i=1 d i

15 Global CC Proof n n E Φ g = E φ k d k = 2 D l i E Ψ g = 1 D d i d i 1 i=1 i=1 c g = Φ g concentration bounds E Φg Ψ g concentration bounds E Ψg n i=1 n 2 i=1 l i d i d i 1 = c g d i The degree of node v i. l i The number of triangles contain v i. n The number of nodes. D = n i=1 d i

16 Guarantees For any ε 1 and δ 1, we have 8 Prob 1 ε c g c g 1 + ε c g 1 δ when the number of samples, r, satisfies r r g = O mixing time(ε)

17 Network Average CC Algorithm The estimated network average CC: c l = Φ l Ψ l 1. Ψ l Sampled nodes average 1/degree. φ k = 1 if there is an edge v k 1 v k+1, 0 Otherwise. 2. Φ l Sampled nodes average φ k 1 d k 1.

18 Evaluations Network n (size) D/n c l c g DBLP 977, Orkut 3,072, Flickr 2,173, Live Journal 4,843, DBLP facts: Paper with most co-authors: has 119 listed authors. Most prolific author: Vincent Poor with 798 entries.

19 Relative estimation value Global CC DBLP Network Percentage of mined nodes Gjoka et al* Ribeiro et al* This work Relative improvement ranges between 300% and 500% depending on the network.

20 Relative estimation value Network Average CC Orkut Network Ribeiro et al Gjoka et al Random walk Relative improvement ranges between 50% and 400% depending on the network Percentage of mined nodes

21 Conclusions 1. New external access estimator from Global Clustering Coefficient. 2. Improved estimator for Network Average Clustering Coefficient. 3. Improved estimator for number of registered users.

22 Estimating Sizes of Social Networks via Biased Sampling Liran Katzir Yahoo! Labs, Haifa, Israel Edo Liberty Yahoo! Labs, Haifa, Israel Oren Somekh Yahoo! Labs, Haifa, Israel

23 The Birthday Paradox The expected number of collisions in a list of r i.i.d. samples from a set of n elements is A collision is a pair of identical samples. Example: Samples: X = (d, b, b, a, b, e). Total 3 collisions, (x 2, x 3 ), (x 2, x 5 ), and (x 3, x 5 ) r r 1 2n.

24 Cardinality estimation uniform When C collisions are observed r r 1 n 2C Needs r = O n samples to converge. Used by [Ye et al, 2010] to estimate the size.

25 Stationary distribution sampling Sampled Nodes: v 5 v 2 v 5 v 4 v 2 Stationary Distribution = d i 1 3 v 3 v d i v v 1 v v 4 v 6 v 8 v 9

26 Cardinality estimation stationary When C collisions are observed 1 d x d n x 2C 4 Needs r = O n log n samples to converge when d i ~zipf( n, 2).

27 Example: d x = d x = n = v 5 v 2 v 5 v 4 v 2 v 3 v 5 v 7 v 1 v 2 v 9 v 4 v 6 v 8

28 Global CC Proof E d x = n i=1 d i D d i E 1 d x = n i=1 d i D 1 d i = n D E C = n i=1 d i D d i D n = d x 1 d x concentration bounds E dx E 2C concentration bounds 2E C 1 d x d i D d n i D d i d i D D = n d i The degree of node v i. n The number of nodes. D = n i=1 d i

29 Improvements 1. Using all samples (Hardiman et al 2009). 2. Using Conditional Monte Carlo (This work).

30 All Samples Restrict computation to indexes m steps apart, I = k, l k l m A collision is only be considered within I. Φ = x k = x l k, l I Ratio of degrees is similarly defined Ψ = k,l I d xk d xl

31 Conditional Monte Carlo A collision between x k and x l, is replaced by the conditional collision is steps k+1 and l+1 respectively. Common Neighbors E 1 xk+1 =x l+1 x k, x l = d xk d xl

32 Conditional Monte Carlo The pair v 4, v 7 is not a collision, but it contributes 1 12 to the collision counter. v 3 v 5 v 7 v 1 v 2 v 9 v 4 v 6 v 8

33 Relative estimation value Size Estimation DBLP Network Priot art This work Percentage of mined nodes

34 Thanks

11 Estimating Clustering Coefficients and Size of Social Networks via Random Walk

11 Estimating Clustering Coefficients and Size of Social Networks via Random Walk Estimating Clustering Coefficients and Size of Social Networks via Random Walk Liran Katzir, Microsoft Research, Advanced Technology Labs, Herzliya, Israel Stephen J. Hardiman, Research was conducted while

More information

Estimating Sizes of Social Networks via Biased Sampling

Estimating Sizes of Social Networks via Biased Sampling Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, Oren Somekh, Ioana A. Cosma ABSTRACT The paper presents algorithms for estimating the number of users in online social

More information

Interact with Strangers

Interact with Strangers Interact with Strangers RATE: Recommendation-aware Trust Evaluation in Online Social Networks Wenjun Jiang 1, 2, Jie Wu 2, and Guojun Wang 1 1. School of Information Science and Engineering, Central South

More information

Overview and comparison of random walk based techniques for estimating network averages

Overview and comparison of random walk based techniques for estimating network averages Overview and comparison of random walk based techniques for estimating network averages Konstantin Avrachenkov (Inria, France) Ribno COSTNET Conference, 21 Sept. 2016 Motivation Analysing (online) social

More information

Modeling population growth in online social networks

Modeling population growth in online social networks Zhu et al. Complex Adaptive Systems Modeling 3, :4 RESEARCH Open Access Modeling population growth in online social networks Konglin Zhu *,WenzhongLi, and Xiaoming Fu *Correspondence: zhu@cs.uni-goettingen.de

More information

Heat Kernel Based Community Detection

Heat Kernel Based Community Detection Heat Kernel Based Community Detection Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Local Community Detection Given seed(s) S in G, find a

More information

Estimating network degree distributions from sampled networks: An inverse problem

Estimating network degree distributions from sampled networks: An inverse problem Estimating network degree distributions from sampled networks: An inverse problem Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University kolaczyk@bu.edu Introduction: Networks and Degree

More information

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Personalized Social Recommendations Accurate or Private

Personalized Social Recommendations Accurate or Private Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma Outline Introduction Motivation The model General

More information

Densest subgraph computation and applications in finding events on social media

Densest subgraph computation and applications in finding events on social media Densest subgraph computation and applications in finding events on social media Oana Denisa Balalau advised by Mauro Sozio Télécom ParisTech, Institut Mines Télécom December 4, 2015 1 / 28 Table of Contents

More information

Request under the Freedom of Information Act 2000 (FOIA)

Request under the Freedom of Information Act 2000 (FOIA) Our Ref: 003698/15 Freedom of Information Section Nottinghamshire Police HQ Sherwood Lodge, Arnold Nottingham NG5 8PP 02 July 2015 Tel: 101 Ext 800 2507 Fax: 0115 967 2896 Request under the Freedom of

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

Overlapping Communities

Overlapping Communities Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,

More information

Efficient Respondents Selection for Biased Survey using Online Social Networks

Efficient Respondents Selection for Biased Survey using Online Social Networks Efficient Respondents Selection for Biased Survey using Online Social Networks Donghyun Kim 1, Jiaofei Zhong 2, Minhyuk Lee 1, Deying Li 3, Alade O. Tokuta 1 1 North Carolina Central University, Durham,

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into

More information

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a

More information

Sampling. Everything Data CompSci Spring 2014

Sampling. Everything Data CompSci Spring 2014 Sampling Everything Data CompSci 290.01 Spring 2014 2 Announcements (Thu. Mar 26) Homework #11 will be posted by noon tomorrow. 3 Outline Simple Random Sampling Means & Proportions Importance Sampling

More information

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University. Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering

More information

Random Walk Based Algorithms for Complex Network Analysis

Random Walk Based Algorithms for Complex Network Analysis Random Walk Based Algorithms for Complex Network Analysis Konstantin Avrachenkov Inria Sophia Antipolis Winter School on Complex Networks 2015, Inria SAM, 12-16 Jan. Complex networks Main features of complex

More information

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Yahoo! Labs Nov. 1 st, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Motivation Modeling Social Streams Future work Motivation Modeling Social Streams

More information

SNS SNS. Wantedly. Connection Optimization in Professional Network Service based on Modern Portfolio Theory

SNS SNS. Wantedly. Connection Optimization in Professional Network Service based on Modern Portfolio Theory SNS 1 1,a) 1 2014 2 15, 2014 11 10 SNS SNS SNS SNS 1 2 2 1 SNS SNS 1 Wantedly Wantedly Connection Optimization in Professional Network Service based on Modern Portfolio Theory Yusuke Sugomori 1 Shohei

More information

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks Joydeep Chandra 1, Ingo Scholtes 2, Niloy Ganguly 1, Frank Schweitzer 2 1 - Dept. of Computer Science and Engineering,

More information

Topics in Data Mining Fall Bruno Ribeiro

Topics in Data Mining Fall Bruno Ribeiro Network Utility Maximization Topics in Data Mining Fall 2015 Bruno Ribeiro 2015 Bruno Ribeiro Data Mining for Smar t Cities Need congestion control 2 Supply and Demand (A Dating Website [China]) Males

More information

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community

More information

K-Nearest Neighbor Temporal Aggregate Queries

K-Nearest Neighbor Temporal Aggregate Queries Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research,

More information

Privacy-Preserving Data Mining

Privacy-Preserving Data Mining CS 380S Privacy-Preserving Data Mining Vitaly Shmatikov slide 1 Reading Assignment Evfimievski, Gehrke, Srikant. Limiting Privacy Breaches in Privacy-Preserving Data Mining (PODS 2003). Blum, Dwork, McSherry,

More information

PU Learning for Matrix Completion

PU Learning for Matrix Completion Cho-Jui Hsieh Dept of Computer Science UT Austin ICML 2015 Joint work with N. Natarajan and I. S. Dhillon Matrix Completion Example: movie recommendation Given a set Ω and the values M Ω, how to predict

More information

Outward Influence and Cascade Size Estimation in Billion-scale Networks

Outward Influence and Cascade Size Estimation in Billion-scale Networks Outward Influence and Cascade Size Estimation in Billion-scale Networks H. T. Nguyen, T. P. Nguyen Virginia Commonwealth Univ. Richmond, VA 2322 {hungnt,trinpm}@vcu.edu T. N. Vu Univ. of Colorado, Boulder

More information

How Large Is Your Graph?

How Large Is Your Graph? How Large Is Your Graph? Varun Kanade, Frederik Mallmann-Trenn, and Victor Verdugo 3 Department of Computer Science, University of Oxford, Oxford, United Kingdom, and The Alan Turing Institute, London,

More information

Kansas Record Hail and the Power of Social Media

Kansas Record Hail and the Power of Social Media Kansas Record Hail and the Power of Social Media Scott F. Blair Jared W. Leighton NOAA/National Weather Service, Topeka, Kansas 15 September 2010 Long-lived supercell (~6 hours) tracked from Reno County

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation

More information

Graph Analysis Using Map/Reduce

Graph Analysis Using Map/Reduce Seminar: Massive-Scale Graph Analysis Summer Semester 2015 Graph Analysis Using Map/Reduce Laurent Linden s9lalind@stud.uni-saarland.de May 14, 2015 Introduction Peta-Scale Graph + Graph mining + MapReduce

More information

Facebook Friends! and Matrix Functions

Facebook Friends! and Matrix Functions Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network

Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network Peng Zhang Purdue University zhan1456@purdue.edu Yajun Wang Microsoft yajunw@microsoft.com Wei Chen Microsoft weic@microsoft.com

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

Social Computing and Its Application in Query Suggestion

Social Computing and Its Application in Query Suggestion Social Computing and Its Application in Query Suggestion Irwin King king@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~king Department of Computer Science & Engineering The Chinese University of Hong Kong

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

Semantic Geospatial Data Integration and Mining for National Security

Semantic Geospatial Data Integration and Mining for National Security Semantic Geospatial Data Integration and Mining for National Security Latifur Khan Ashraful Alam Ganesh Subbiah Bhavani Thuraisingham University of Texas at Dallas (Funded by Raytheon Corporation) Shashi

More information

Complexity Theory of Polynomial-Time Problems

Complexity Theory of Polynomial-Time Problems Complexity Theory of Polynomial-Time Problems Lecture 3: The polynomial method Part I: Orthogonal Vectors Sebastian Krinninger Organization of lecture No lecture on 26.05. (State holiday) 2 nd exercise

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

arxiv: v1 [cs.ds] 16 Apr 2017

arxiv: v1 [cs.ds] 16 Apr 2017 Outward Influence and Cascade Size Estimation in Billion-scale Networks H. T. Nguyen, T. P. Nguyen Virginia Commonwealth Univ. Richmond, VA 23220 {hungnt,trinpm}@vcu.edu T. N. Vu Univ. of Colorado, Boulder

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

Data and Algorithms of the Web

Data and Algorithms of the Web Data and Algorithms of the Web Link Analysis Algorithms Page Rank some slides from: Anand Rajaraman, Jeffrey D. Ullman InfoLab (Stanford University) Link Analysis Algorithms Page Rank Hubs and Authorities

More information

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Rutgers, The State University of New Jersey Nov. 12, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Motivation Modeling Social Streams Future

More information

A Bivariate Point Process Model with Application to Social Media User Content Generation

A Bivariate Point Process Model with Application to Social Media User Content Generation 1 / 33 A Bivariate Point Process Model with Application to Social Media User Content Generation Emma Jingfei Zhang ezhang@bus.miami.edu Yongtao Guan yguan@bus.miami.edu Department of Management Science

More information

Sampling and Estimation in Network Graphs

Sampling and Estimation in Network Graphs Sampling and Estimation in Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ March

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google

More information

Finding central nodes in large networks

Finding central nodes in large networks Finding central nodes in large networks Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands Woudschoten Conference 2017 Complex networks Networks: Internet, WWW, social

More information

What is this Page Known for? Computing Web Page Reputations. Outline

What is this Page Known for? Computing Web Page Reputations. Outline What is this Page Known for? Computing Web Page Reputations Davood Rafiei University of Alberta http://www.cs.ualberta.ca/~drafiei Joint work with Alberto Mendelzon (U. of Toronto) 1 Outline Scenarios

More information

to be more efficient on enormous scale, in a stream, or in distributed settings.

to be more efficient on enormous scale, in a stream, or in distributed settings. 16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to

More information

Collaborative Filtering

Collaborative Filtering Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian Collaborative Filtering Combining information among collaborating entities to make

More information

Multi-armed Bandits in the Presence of Side Observations in Social Networks

Multi-armed Bandits in the Presence of Side Observations in Social Networks 52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness

More information

Parameter estimators of sparse random intersection graphs with thinned communities

Parameter estimators of sparse random intersection graphs with thinned communities Parameter estimators of sparse random intersection graphs with thinned communities Lasse Leskelä Aalto University Johan van Leeuwaarden Eindhoven University of Technology Joona Karjalainen Aalto University

More information

From Social User Activities to People Affiliation

From Social User Activities to People Affiliation 2013 IEEE 13th International Conference on Data Mining From Social User Activities to People Affiliation Guangxiang Zeng 1, Ping uo 2, Enhong Chen 1 and Min Wang 3 1 University of Science and Technology

More information

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015 Matrix Factorization In Recommender Systems Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015 Table of Contents Background: Recommender Systems (RS) Evolution of Matrix

More information

Scalable Algorithms for Distribution Search

Scalable Algorithms for Distribution Search Scalable Algorithms for Distribution Search Yasuko Matsubara (Kyoto University) Yasushi Sakurai (NTT Communication Science Labs) Masatoshi Yoshikawa (Kyoto University) 1 Introduction Main intuition and

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE Cost and Preference in Recommender Systems Junhua Chen, Big Data Research Center, UESTC Email:junmshao@uestc.edu.cn http://staff.uestc.edu.cn/shaojunming Abstract In many recommender systems (RS), user

More information

Exploring Urban Areas of Interest. Yingjie Hu and Sathya Prasad

Exploring Urban Areas of Interest. Yingjie Hu and Sathya Prasad Exploring Urban Areas of Interest Yingjie Hu and Sathya Prasad What is Urban Areas of Interest (AOIs)? What is Urban Areas of Interest (AOIs)? Urban AOIs exist in people s minds and defined by people s

More information

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a

More information

Parallel Local Graph Clustering

Parallel Local Graph Clustering Parallel Local Graph Clustering Kimon Fountoulakis, joint work with J. Shun, X. Cheng, F. Roosta-Khorasani, M. Mahoney, D. Gleich University of California Berkeley and Purdue University Based on J. Shun,

More information

On Multiset Selection with Size Constraints

On Multiset Selection with Size Constraints On Multiset Selection with Size Constraints Chao Qian, Yibo Zhang, Ke Tang 2, Xin Yao 2 Anhui Province Key Lab of Big Data Analysis and Application, School of Computer Science and Technology, University

More information

EXPLORING THE BIRTHDAY ATTACK / PARADOX 1 : A Powerful Vehicle Underlying Information Security

EXPLORING THE BIRTHDAY ATTACK / PARADOX 1 : A Powerful Vehicle Underlying Information Security EXPLORING THE BIRTHDAY ATTACK / PARADOX 1 : A Powerful Vehicle Underlying Information Security Khosrow Moshirvaziri, Information Systems Dept., California State University, Long Beach, Long Beach, CA 90840,

More information

B490 Mining the Big Data

B490 Mining the Big Data B490 Mining the Big Data 1 Finding Similar Items Qin Zhang 1-1 Motivations Finding similar documents/webpages/images (Approximate) mirror sites. Application: Don t want to show both when Google. 2-1 Motivations

More information

Lecture 3: Miscellaneous Techniques

Lecture 3: Miscellaneous Techniques Lecture 3: Miscellaneous Techniques Rajat Mittal IIT Kanpur In this document, we will take a look at few diverse techniques used in combinatorics, exemplifying the fact that combinatorics is a collection

More information

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks Osman Yağan Department of ECE Carnegie Mellon University Joint work with Y. Zhuang and V. Gligor (CMU) Alex

More information

Info-Cluster Based Regional Influence Analysis in Social Networks

Info-Cluster Based Regional Influence Analysis in Social Networks Info-Cluster Based Regional Influence Analysis in Social Networks Chao Li,2,3, Zhongying Zhao,2,3,JunLuo, and Jianping Fan Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen

More information

Social and Technological Network Analysis. Lecture 11: Spa;al and Social Network Analysis. Dr. Cecilia Mascolo

Social and Technological Network Analysis. Lecture 11: Spa;al and Social Network Analysis. Dr. Cecilia Mascolo Social and Technological Network Analysis Lecture 11: Spa;al and Social Network Analysis Dr. Cecilia Mascolo In This Lecture In this lecture we will study spa;al networks and geo- social networks through

More information

Distributed Architectures

Distributed Architectures Distributed Architectures Software Architecture VO/KU (707023/707024) Roman Kern KTI, TU Graz 2015-01-21 Roman Kern (KTI, TU Graz) Distributed Architectures 2015-01-21 1 / 64 Outline 1 Introduction 2 Independent

More information

RESEARCH ARTICLE. Online quantization in nonlinear filtering

RESEARCH ARTICLE. Online quantization in nonlinear filtering Journal of Statistical Computation & Simulation Vol. 00, No. 00, Month 200x, 3 RESEARCH ARTICLE Online quantization in nonlinear filtering A. Feuer and G. C. Goodwin Received 00 Month 200x; in final form

More information

Mining Triadic Closure Patterns in Social Networks

Mining Triadic Closure Patterns in Social Networks Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University

More information

An Efficient reconciliation algorithm for social networks

An Efficient reconciliation algorithm for social networks An Efficient reconciliation algorithm for social networks Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models Outline Graph reconciliation

More information

Privacy in Statistical Databases

Privacy in Statistical Databases Privacy in Statistical Databases Individuals x 1 x 2 x n Server/agency ) answers. A queries Users Government, researchers, businesses or) Malicious adversary What information can be released? Two conflicting

More information

Museumpark Revisit: A Data Mining Approach in the Context of Hong Kong. Keywords: Museumpark; Museum Demand; Spill-over Effects; Data Mining

Museumpark Revisit: A Data Mining Approach in the Context of Hong Kong. Keywords: Museumpark; Museum Demand; Spill-over Effects; Data Mining Chi Fung Lam The Chinese University of Hong Kong Jian Ming Luo City University of Macau Museumpark Revisit: A Data Mining Approach in the Context of Hong Kong It is important for tourism managers to understand

More information

Bias Correction in Clustering Coefficient Estimation

Bias Correction in Clustering Coefficient Estimation Bias Correction in Clustering Coefficient Estimation Roohollah Etemadi, Jianguo Lu School of Comuter Science, University of Windsor Windsor, ON, Canada etemadir, jlu@uwindsor.ca Abstract Clustering coefficient

More information

OLAK: An Efficient Algorithm to Prevent Unraveling in Social Networks. Fan Zhang 1, Wenjie Zhang 2, Ying Zhang 1, Lu Qin 1, Xuemin Lin 2

OLAK: An Efficient Algorithm to Prevent Unraveling in Social Networks. Fan Zhang 1, Wenjie Zhang 2, Ying Zhang 1, Lu Qin 1, Xuemin Lin 2 OLAK: An Efficient Algorithm to Prevent Unraveling in Social Networks Fan Zhang 1, Wenjie Zhang 2, Ying Zhang 1, Lu Qin 1, Xuemin Lin 2 1 University of Technology Sydney, Computer 2 University Science

More information

Online Social Networks and Media. Opinion formation on social networks

Online Social Networks and Media. Opinion formation on social networks Online Social Networks and Media Opinion formation on social networks Diffusion of items So far we have assumed that what is being diffused in the network is some discrete item: E.g., a virus, a product,

More information

SAMPLING AND INVERSION

SAMPLING AND INVERSION SAMPLING AND INVERSION Darryl Veitch dveitch@unimelb.edu.au CUBIN, Department of Electrical & Electronic Engineering University of Melbourne Workshop on Sampling the Internet, Paris 2005 A TALK WITH TWO

More information

ECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap

ECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap ECS 253 / MAE 253, Lecture 15 May 17, 2016 I. Probability generating function recap Part I. Ensemble approaches A. Master equations (Random graph evolution, cluster aggregation) B. Network configuration

More information

SocViz: Visualization of Facebook Data

SocViz: Visualization of Facebook Data SocViz: Visualization of Facebook Data Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana Champaign Urbana, IL 61801 USA bhatele2@uiuc.edu Kyratso Karahalios Department of

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks

A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks Kyle Kloster and David F. Gleich Purdue University December 14, 2013 Supported by NSF CAREER 1149756-CCF Kyle

More information

Structural Data De-anonymization: Quantification, Practice, and Implications

Structural Data De-anonymization: Quantification, Practice, and Implications Structural Data De-anonymization: Quantification, Practice, and Implications ABSTRACT Shouling Ji School of Electrical and Computer Engineering Georgia Institute of Technology sji@gatech.edu Mudhakar Srivatsa

More information

Maximizing Circle of Trust in Online Social Networks

Maximizing Circle of Trust in Online Social Networks Maximizing Circle of Trust in Online Social Networks Yilin Shen, Yu-Song Syu, Dung T. Nguyen, My T. Thai Department of Computer and Information Science and Engineering University of Florida, USA {yshen,

More information

Constructing Guaranteed Automatic Numerical Algorithms for U

Constructing Guaranteed Automatic Numerical Algorithms for U Constructing Guaranteed Automatic Numerical Algorithms for Univariate Integration Department of Applied Mathematics, Illinois Institute of Technology July 10, 2014 Contents Introduction.. GAIL What do

More information

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia

More information

Basics and Random Graphs con0nued

Basics and Random Graphs con0nued Basics and Random Graphs con0nued Social and Technological Networks Rik Sarkar University of Edinburgh, 2017. Random graphs on jupyter notebook Solu0on to exercises 1 is out If your BSc/MSc/PhD work is

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.) Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.) Which pair of nodes {i,j} should be connected? Variant: node i is given Alice Bob Charlie Friend

More information