Estimating Clustering Coefficients and Size of Social Networks via Random Walk
|
|
- Polly Casey
- 5 years ago
- Views:
Transcription
1 Estimating Clustering Coefficients and Size of Social Networks via Random Walk Stephen J. Hardiman* Capital Fund Management France Liran Katzir Advanced Technology Labs Microsoft Research, Israel *Research was conducted while the author was unaffiliated
2 Motivation: Social Networks Qzone Netlog Google+ Bebo Twitter Facebook Classmates.com Sina Weibo Sonico.com Orkut Renren Habbo Flixster MyLife Tagged Friendster hi5 LinkedIn Vkontakte Plaxo
3 Motivation: External access Social Analytics The online social network v 3 v 5 v 7 v 1 v 2 v 9 Privacy Disk Space Communication v 4 v 6 v 8
4 Task: Estimate parameters Global Clustering Coefficient Network Average CC Number of Registered Users Predicting Social Products Potential. Business development/ advertisement/ market size.
5 Global Clustering Coefficient Global CC = 3 x number of triangles number of connected triplet v 3 v 5 v 7 v 1 v 2 v 9 Triangle v 4 v 6 v 8 Connected Triplet
6 Global Clustering Coefficient Exact: [Alon et al, 1997] Estimation input is read at least once: Random Access: [Avron, 2010] Streaming Model: [Buriol et al, 2006] Estimation sampling: Random Access: [Schank et al, 2005] External Access: This work.
7 Local Clustering Coefficient C i = #connections between vi s neighbors d i (d i 1)/2 C 2 =1/3 v 3 v 5 d i degree of node i d 1 = 1 d 2 = 3 d 9 = 2 v 7 v 1 v 2 v 9 Network Average CC = average local CC v 4 v 6 v 8
8 Network Average CC Exact: Naïve. Estimation input is read at least once: Streaming Model: [Becchetti et al, 2010] Estimation sampling: Random Access: [Schank et al, 2005] External Access: [Ribeiro et al 2010], [Gjoka et al, 2010], This work Improved accuracy.
9 Number of Registered Users Exact: trivial Estimation sampling: External Access: [Hardiman et al 2009], [Katzir et al, 2011], This work Improved accuracy.
10 Random Walk Sampled Nodes: v 1 v 2 v 3 v 4 v 5 Stationary Distribution = d i 1 3 v 3 v d i v v 1 v v 4 v 6 v 8 v 9
11 Random Walk - Summary Sampled Nodes Visible Nodes Invisible Nodes Visible Edges Invisible Edges v 3 v 5 v 7 v 1 v 2 v 9 v 4 v 6 v 8
12 Global CC Algorithm The estimated global clustering coefficient: c g = Φ g Ψ g 1. Ψ g Sampled nodes average degree if there is an edge v k 1 v k+1, φ k = 1 iff v k 1, v k, v k+1 is a triangle 0 Otherwise. 2. Φ g Sampled nodes average φ k d k.
13 Global CC Example Φ g = = 2 3 Ψ g = = 7 5 φ 2 = 0 φ 3 = 1 v 1 v 2 v 3 v 5 v 7 c g = c g = φ 4 = 0 v 4 v 6
14 E φ k d k = Expectation of φ k = = n i=1 n i=1 n i=1 d i D E φ kd k x k = v i d i D 2l i D d i The degree of node v i. 2l i d i d i d i l i The number of triangles contain v i. n The number of nodes. Total expectation d i d i combinations. 2l i yield φ k =1 D = n i=1 d i
15 Global CC Proof n n E Φ g = E φ k d k = 2 D l i E Ψ g = 1 D d i d i 1 i=1 i=1 c g = Φ g concentration bounds E Φg Ψ g concentration bounds E Ψg n i=1 n 2 i=1 l i d i d i 1 = c g d i The degree of node v i. l i The number of triangles contain v i. n The number of nodes. D = n i=1 d i
16 Guarantees For any ε 1 and δ 1, we have 8 Prob 1 ε c g c g 1 + ε c g 1 δ when the number of samples, r, satisfies r r g = O mixing time(ε)
17 Network Average CC Algorithm The estimated network average CC: c l = Φ l Ψ l 1. Ψ l Sampled nodes average 1/degree. φ k = 1 if there is an edge v k 1 v k+1, 0 Otherwise. 2. Φ l Sampled nodes average φ k 1 d k 1.
18 Evaluations Network n (size) D/n c l c g DBLP 977, Orkut 3,072, Flickr 2,173, Live Journal 4,843, DBLP facts: Paper with most co-authors: has 119 listed authors. Most prolific author: Vincent Poor with 798 entries.
19 Relative estimation value Global CC DBLP Network Percentage of mined nodes Gjoka et al* Ribeiro et al* This work Relative improvement ranges between 300% and 500% depending on the network.
20 Relative estimation value Network Average CC Orkut Network Ribeiro et al Gjoka et al Random walk Relative improvement ranges between 50% and 400% depending on the network Percentage of mined nodes
21 Conclusions 1. New external access estimator from Global Clustering Coefficient. 2. Improved estimator for Network Average Clustering Coefficient. 3. Improved estimator for number of registered users.
22 Estimating Sizes of Social Networks via Biased Sampling Liran Katzir Yahoo! Labs, Haifa, Israel Edo Liberty Yahoo! Labs, Haifa, Israel Oren Somekh Yahoo! Labs, Haifa, Israel
23 The Birthday Paradox The expected number of collisions in a list of r i.i.d. samples from a set of n elements is A collision is a pair of identical samples. Example: Samples: X = (d, b, b, a, b, e). Total 3 collisions, (x 2, x 3 ), (x 2, x 5 ), and (x 3, x 5 ) r r 1 2n.
24 Cardinality estimation uniform When C collisions are observed r r 1 n 2C Needs r = O n samples to converge. Used by [Ye et al, 2010] to estimate the size.
25 Stationary distribution sampling Sampled Nodes: v 5 v 2 v 5 v 4 v 2 Stationary Distribution = d i 1 3 v 3 v d i v v 1 v v 4 v 6 v 8 v 9
26 Cardinality estimation stationary When C collisions are observed 1 d x d n x 2C 4 Needs r = O n log n samples to converge when d i ~zipf( n, 2).
27 Example: d x = d x = n = v 5 v 2 v 5 v 4 v 2 v 3 v 5 v 7 v 1 v 2 v 9 v 4 v 6 v 8
28 Global CC Proof E d x = n i=1 d i D d i E 1 d x = n i=1 d i D 1 d i = n D E C = n i=1 d i D d i D n = d x 1 d x concentration bounds E dx E 2C concentration bounds 2E C 1 d x d i D d n i D d i d i D D = n d i The degree of node v i. n The number of nodes. D = n i=1 d i
29 Improvements 1. Using all samples (Hardiman et al 2009). 2. Using Conditional Monte Carlo (This work).
30 All Samples Restrict computation to indexes m steps apart, I = k, l k l m A collision is only be considered within I. Φ = x k = x l k, l I Ratio of degrees is similarly defined Ψ = k,l I d xk d xl
31 Conditional Monte Carlo A collision between x k and x l, is replaced by the conditional collision is steps k+1 and l+1 respectively. Common Neighbors E 1 xk+1 =x l+1 x k, x l = d xk d xl
32 Conditional Monte Carlo The pair v 4, v 7 is not a collision, but it contributes 1 12 to the collision counter. v 3 v 5 v 7 v 1 v 2 v 9 v 4 v 6 v 8
33 Relative estimation value Size Estimation DBLP Network Priot art This work Percentage of mined nodes
34 Thanks
11 Estimating Clustering Coefficients and Size of Social Networks via Random Walk
Estimating Clustering Coefficients and Size of Social Networks via Random Walk Liran Katzir, Microsoft Research, Advanced Technology Labs, Herzliya, Israel Stephen J. Hardiman, Research was conducted while
More informationEstimating Sizes of Social Networks via Biased Sampling
Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, Oren Somekh, Ioana A. Cosma ABSTRACT The paper presents algorithms for estimating the number of users in online social
More informationInteract with Strangers
Interact with Strangers RATE: Recommendation-aware Trust Evaluation in Online Social Networks Wenjun Jiang 1, 2, Jie Wu 2, and Guojun Wang 1 1. School of Information Science and Engineering, Central South
More informationOverview and comparison of random walk based techniques for estimating network averages
Overview and comparison of random walk based techniques for estimating network averages Konstantin Avrachenkov (Inria, France) Ribno COSTNET Conference, 21 Sept. 2016 Motivation Analysing (online) social
More informationModeling population growth in online social networks
Zhu et al. Complex Adaptive Systems Modeling 3, :4 RESEARCH Open Access Modeling population growth in online social networks Konglin Zhu *,WenzhongLi, and Xiaoming Fu *Correspondence: zhu@cs.uni-goettingen.de
More informationHeat Kernel Based Community Detection
Heat Kernel Based Community Detection Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Local Community Detection Given seed(s) S in G, find a
More informationEstimating network degree distributions from sampled networks: An inverse problem
Estimating network degree distributions from sampled networks: An inverse problem Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University kolaczyk@bu.edu Introduction: Networks and Degree
More informationJure Leskovec Joint work with Jaewon Yang, Julian McAuley
Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength
More informationDS504/CS586: Big Data Analytics Graph Mining II
Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationPersonalized Social Recommendations Accurate or Private
Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma Outline Introduction Motivation The model General
More informationDensest subgraph computation and applications in finding events on social media
Densest subgraph computation and applications in finding events on social media Oana Denisa Balalau advised by Mauro Sozio Télécom ParisTech, Institut Mines Télécom December 4, 2015 1 / 28 Table of Contents
More informationRequest under the Freedom of Information Act 2000 (FOIA)
Our Ref: 003698/15 Freedom of Information Section Nottinghamshire Police HQ Sherwood Lodge, Arnold Nottingham NG5 8PP 02 July 2015 Tel: 101 Ext 800 2507 Fax: 0115 967 2896 Request under the Freedom of
More informationDS504/CS586: Big Data Analytics Graph Mining II
Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please
More informationLab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018
Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google
More informationOverlapping Communities
Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationMobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL
MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,
More informationEfficient Respondents Selection for Biased Survey using Online Social Networks
Efficient Respondents Selection for Biased Survey using Online Social Networks Donghyun Kim 1, Jiaofei Zhong 2, Minhyuk Lee 1, Deying Li 3, Alade O. Tokuta 1 1 North Carolina Central University, Durham,
More informationDATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS
DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information
More informationCommunities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices
Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationSampling. Everything Data CompSci Spring 2014
Sampling Everything Data CompSci 290.01 Spring 2014 2 Announcements (Thu. Mar 26) Homework #11 will be posted by noon tomorrow. 3 Outline Simple Random Sampling Means & Proportions Importance Sampling
More informationSlide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.
Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering
More informationRandom Walk Based Algorithms for Complex Network Analysis
Random Walk Based Algorithms for Complex Network Analysis Konstantin Avrachenkov Inria Sophia Antipolis Winter School on Complex Networks 2015, Inria SAM, 12-16 Jan. Complex networks Main features of complex
More informationYahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University
Yahoo! Labs Nov. 1 st, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Motivation Modeling Social Streams Future work Motivation Modeling Social Streams
More informationSNS SNS. Wantedly. Connection Optimization in Professional Network Service based on Modern Portfolio Theory
SNS 1 1,a) 1 2014 2 15, 2014 11 10 SNS SNS SNS SNS 1 2 2 1 SNS SNS 1 Wantedly Wantedly Connection Optimization in Professional Network Service based on Modern Portfolio Theory Yusuke Sugomori 1 Shohei
More informationA Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks
A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks Joydeep Chandra 1, Ingo Scholtes 2, Niloy Ganguly 1, Frank Schweitzer 2 1 - Dept. of Computer Science and Engineering,
More informationTopics in Data Mining Fall Bruno Ribeiro
Network Utility Maximization Topics in Data Mining Fall 2015 Bruno Ribeiro 2015 Bruno Ribeiro Data Mining for Smar t Cities Need congestion control 2 Supply and Demand (A Dating Website [China]) Males
More informationOverlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community
More informationK-Nearest Neighbor Temporal Aggregate Queries
Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research,
More informationPrivacy-Preserving Data Mining
CS 380S Privacy-Preserving Data Mining Vitaly Shmatikov slide 1 Reading Assignment Evfimievski, Gehrke, Srikant. Limiting Privacy Breaches in Privacy-Preserving Data Mining (PODS 2003). Blum, Dwork, McSherry,
More informationPU Learning for Matrix Completion
Cho-Jui Hsieh Dept of Computer Science UT Austin ICML 2015 Joint work with N. Natarajan and I. S. Dhillon Matrix Completion Example: movie recommendation Given a set Ω and the values M Ω, how to predict
More informationOutward Influence and Cascade Size Estimation in Billion-scale Networks
Outward Influence and Cascade Size Estimation in Billion-scale Networks H. T. Nguyen, T. P. Nguyen Virginia Commonwealth Univ. Richmond, VA 2322 {hungnt,trinpm}@vcu.edu T. N. Vu Univ. of Colorado, Boulder
More informationHow Large Is Your Graph?
How Large Is Your Graph? Varun Kanade, Frederik Mallmann-Trenn, and Victor Verdugo 3 Department of Computer Science, University of Oxford, Oxford, United Kingdom, and The Alan Turing Institute, London,
More informationKansas Record Hail and the Power of Social Media
Kansas Record Hail and the Power of Social Media Scott F. Blair Jared W. Leighton NOAA/National Weather Service, Topeka, Kansas 15 September 2010 Long-lived supercell (~6 hours) tracked from Reno County
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation
More informationGraph Analysis Using Map/Reduce
Seminar: Massive-Scale Graph Analysis Summer Semester 2015 Graph Analysis Using Map/Reduce Laurent Linden s9lalind@stud.uni-saarland.de May 14, 2015 Introduction Peta-Scale Graph + Graph mining + MapReduce
More informationFacebook Friends! and Matrix Functions
Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra
More informationLink Prediction. Eman Badr Mohammed Saquib Akmal Khan
Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling
More informationMinimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network
Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network Peng Zhang Purdue University zhan1456@purdue.edu Yajun Wang Microsoft yajunw@microsoft.com Wei Chen Microsoft weic@microsoft.com
More informationIntroduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa
Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis
More informationSocial Computing and Its Application in Query Suggestion
Social Computing and Its Application in Query Suggestion Irwin King king@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~king Department of Computer Science & Engineering The Chinese University of Hong Kong
More informationMarkov Chain Monte Carlo The Metropolis-Hastings Algorithm
Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability
More informationSemantic Geospatial Data Integration and Mining for National Security
Semantic Geospatial Data Integration and Mining for National Security Latifur Khan Ashraful Alam Ganesh Subbiah Bhavani Thuraisingham University of Texas at Dallas (Funded by Raytheon Corporation) Shashi
More informationComplexity Theory of Polynomial-Time Problems
Complexity Theory of Polynomial-Time Problems Lecture 3: The polynomial method Part I: Orthogonal Vectors Sebastian Krinninger Organization of lecture No lecture on 26.05. (State holiday) 2 nd exercise
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationarxiv: v1 [cs.ds] 16 Apr 2017
Outward Influence and Cascade Size Estimation in Billion-scale Networks H. T. Nguyen, T. P. Nguyen Virginia Commonwealth Univ. Richmond, VA 23220 {hungnt,trinpm}@vcu.edu T. N. Vu Univ. of Colorado, Boulder
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve
More informationData and Algorithms of the Web
Data and Algorithms of the Web Link Analysis Algorithms Page Rank some slides from: Anand Rajaraman, Jeffrey D. Ullman InfoLab (Stanford University) Link Analysis Algorithms Page Rank Hubs and Authorities
More informationLiangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA
Rutgers, The State University of New Jersey Nov. 12, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Motivation Modeling Social Streams Future
More informationA Bivariate Point Process Model with Application to Social Media User Content Generation
1 / 33 A Bivariate Point Process Model with Application to Social Media User Content Generation Emma Jingfei Zhang ezhang@bus.miami.edu Yongtao Guan yguan@bus.miami.edu Department of Management Science
More informationSampling and Estimation in Network Graphs
Sampling and Estimation in Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ March
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationGoogle PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano
Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google
More informationFinding central nodes in large networks
Finding central nodes in large networks Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands Woudschoten Conference 2017 Complex networks Networks: Internet, WWW, social
More informationWhat is this Page Known for? Computing Web Page Reputations. Outline
What is this Page Known for? Computing Web Page Reputations Davood Rafiei University of Alberta http://www.cs.ualberta.ca/~drafiei Joint work with Alberto Mendelzon (U. of Toronto) 1 Outline Scenarios
More informationto be more efficient on enormous scale, in a stream, or in distributed settings.
16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to
More informationCollaborative Filtering
Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian Collaborative Filtering Combining information among collaborating entities to make
More informationMulti-armed Bandits in the Presence of Side Observations in Social Networks
52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness
More informationParameter estimators of sparse random intersection graphs with thinned communities
Parameter estimators of sparse random intersection graphs with thinned communities Lasse Leskelä Aalto University Johan van Leeuwaarden Eindhoven University of Technology Joona Karjalainen Aalto University
More informationFrom Social User Activities to People Affiliation
2013 IEEE 13th International Conference on Data Mining From Social User Activities to People Affiliation Guangxiang Zeng 1, Ping uo 2, Enhong Chen 1 and Min Wang 3 1 University of Science and Technology
More informationMatrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015
Matrix Factorization In Recommender Systems Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015 Table of Contents Background: Recommender Systems (RS) Evolution of Matrix
More informationScalable Algorithms for Distribution Search
Scalable Algorithms for Distribution Search Yasuko Matsubara (Kyoto University) Yasushi Sakurai (NTT Communication Science Labs) Masatoshi Yoshikawa (Kyoto University) 1 Introduction Main intuition and
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationLecture 14: Random Walks, Local Graph Clustering, Linear Programming
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These
More informationCost and Preference in Recommender Systems Junhua Chen LESS IS MORE
Cost and Preference in Recommender Systems Junhua Chen, Big Data Research Center, UESTC Email:junmshao@uestc.edu.cn http://staff.uestc.edu.cn/shaojunming Abstract In many recommender systems (RS), user
More informationExploring Urban Areas of Interest. Yingjie Hu and Sathya Prasad
Exploring Urban Areas of Interest Yingjie Hu and Sathya Prasad What is Urban Areas of Interest (AOIs)? What is Urban Areas of Interest (AOIs)? Urban AOIs exist in people s minds and defined by people s
More informationWiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte
Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a
More informationParallel Local Graph Clustering
Parallel Local Graph Clustering Kimon Fountoulakis, joint work with J. Shun, X. Cheng, F. Roosta-Khorasani, M. Mahoney, D. Gleich University of California Berkeley and Purdue University Based on J. Shun,
More informationOn Multiset Selection with Size Constraints
On Multiset Selection with Size Constraints Chao Qian, Yibo Zhang, Ke Tang 2, Xin Yao 2 Anhui Province Key Lab of Big Data Analysis and Application, School of Computer Science and Technology, University
More informationEXPLORING THE BIRTHDAY ATTACK / PARADOX 1 : A Powerful Vehicle Underlying Information Security
EXPLORING THE BIRTHDAY ATTACK / PARADOX 1 : A Powerful Vehicle Underlying Information Security Khosrow Moshirvaziri, Information Systems Dept., California State University, Long Beach, Long Beach, CA 90840,
More informationB490 Mining the Big Data
B490 Mining the Big Data 1 Finding Similar Items Qin Zhang 1-1 Motivations Finding similar documents/webpages/images (Approximate) mirror sites. Application: Don t want to show both when Google. 2-1 Motivations
More informationLecture 3: Miscellaneous Techniques
Lecture 3: Miscellaneous Techniques Rajat Mittal IIT Kanpur In this document, we will take a look at few diverse techniques used in combinatorics, exemplifying the fact that combinatorics is a collection
More informationModeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan
Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks Osman Yağan Department of ECE Carnegie Mellon University Joint work with Y. Zhuang and V. Gligor (CMU) Alex
More informationInfo-Cluster Based Regional Influence Analysis in Social Networks
Info-Cluster Based Regional Influence Analysis in Social Networks Chao Li,2,3, Zhongying Zhao,2,3,JunLuo, and Jianping Fan Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen
More informationSocial and Technological Network Analysis. Lecture 11: Spa;al and Social Network Analysis. Dr. Cecilia Mascolo
Social and Technological Network Analysis Lecture 11: Spa;al and Social Network Analysis Dr. Cecilia Mascolo In This Lecture In this lecture we will study spa;al networks and geo- social networks through
More informationDistributed Architectures
Distributed Architectures Software Architecture VO/KU (707023/707024) Roman Kern KTI, TU Graz 2015-01-21 Roman Kern (KTI, TU Graz) Distributed Architectures 2015-01-21 1 / 64 Outline 1 Introduction 2 Independent
More informationRESEARCH ARTICLE. Online quantization in nonlinear filtering
Journal of Statistical Computation & Simulation Vol. 00, No. 00, Month 200x, 3 RESEARCH ARTICLE Online quantization in nonlinear filtering A. Feuer and G. C. Goodwin Received 00 Month 200x; in final form
More informationMining Triadic Closure Patterns in Social Networks
Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University
More informationAn Efficient reconciliation algorithm for social networks
An Efficient reconciliation algorithm for social networks Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models Outline Graph reconciliation
More informationPrivacy in Statistical Databases
Privacy in Statistical Databases Individuals x 1 x 2 x n Server/agency ) answers. A queries Users Government, researchers, businesses or) Malicious adversary What information can be released? Two conflicting
More informationMuseumpark Revisit: A Data Mining Approach in the Context of Hong Kong. Keywords: Museumpark; Museum Demand; Spill-over Effects; Data Mining
Chi Fung Lam The Chinese University of Hong Kong Jian Ming Luo City University of Macau Museumpark Revisit: A Data Mining Approach in the Context of Hong Kong It is important for tourism managers to understand
More informationBias Correction in Clustering Coefficient Estimation
Bias Correction in Clustering Coefficient Estimation Roohollah Etemadi, Jianguo Lu School of Comuter Science, University of Windsor Windsor, ON, Canada etemadir, jlu@uwindsor.ca Abstract Clustering coefficient
More informationOLAK: An Efficient Algorithm to Prevent Unraveling in Social Networks. Fan Zhang 1, Wenjie Zhang 2, Ying Zhang 1, Lu Qin 1, Xuemin Lin 2
OLAK: An Efficient Algorithm to Prevent Unraveling in Social Networks Fan Zhang 1, Wenjie Zhang 2, Ying Zhang 1, Lu Qin 1, Xuemin Lin 2 1 University of Technology Sydney, Computer 2 University Science
More informationOnline Social Networks and Media. Opinion formation on social networks
Online Social Networks and Media Opinion formation on social networks Diffusion of items So far we have assumed that what is being diffused in the network is some discrete item: E.g., a virus, a product,
More informationSAMPLING AND INVERSION
SAMPLING AND INVERSION Darryl Veitch dveitch@unimelb.edu.au CUBIN, Department of Electrical & Electronic Engineering University of Melbourne Workshop on Sampling the Internet, Paris 2005 A TALK WITH TWO
More informationECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap
ECS 253 / MAE 253, Lecture 15 May 17, 2016 I. Probability generating function recap Part I. Ensemble approaches A. Master equations (Random graph evolution, cluster aggregation) B. Network configuration
More informationSocViz: Visualization of Facebook Data
SocViz: Visualization of Facebook Data Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana Champaign Urbana, IL 61801 USA bhatele2@uiuc.edu Kyratso Karahalios Department of
More informationECEN 689 Special Topics in Data Science for Communications Networks
ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs
More informationA Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks
A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks Kyle Kloster and David F. Gleich Purdue University December 14, 2013 Supported by NSF CAREER 1149756-CCF Kyle
More informationStructural Data De-anonymization: Quantification, Practice, and Implications
Structural Data De-anonymization: Quantification, Practice, and Implications ABSTRACT Shouling Ji School of Electrical and Computer Engineering Georgia Institute of Technology sji@gatech.edu Mudhakar Srivatsa
More informationMaximizing Circle of Trust in Online Social Networks
Maximizing Circle of Trust in Online Social Networks Yilin Shen, Yu-Song Syu, Dung T. Nguyen, My T. Thai Department of Computer and Information Science and Engineering University of Florida, USA {yshen,
More informationConstructing Guaranteed Automatic Numerical Algorithms for U
Constructing Guaranteed Automatic Numerical Algorithms for Univariate Integration Department of Applied Mathematics, Illinois Institute of Technology July 10, 2014 Contents Introduction.. GAIL What do
More informationLecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016
Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia
More informationBasics and Random Graphs con0nued
Basics and Random Graphs con0nued Social and Technological Networks Rik Sarkar University of Edinburgh, 2017. Random graphs on jupyter notebook Solu0on to exercises 1 is out If your BSc/MSc/PhD work is
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com
More informationPurnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)
Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.) Which pair of nodes {i,j} should be connected? Variant: node i is given Alice Bob Charlie Friend
More information