ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Similar documents
Web Structure Mining Nodes, Links and Influence

Recommendation Systems

Kristina Lerman USC Information Sciences Institute

Collaborative Filtering. Radek Pelánek

Collaborative Filtering

Recommendation Systems

Complex Social System, Elections. Introduction to Network Analysis 1

Recommendation Systems

Data Mining Recitation Notes Week 3

Class President: A Network Approach to Popularity. Due July 18, 2014

DS504/CS586: Big Data Analytics Graph Mining II

CS246 Final Exam, Winter 2011

Online Social Networks and Media. Link Analysis and Web Search

Introduction to Link Prediction

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

DS504/CS586: Big Data Analytics Graph Mining II

Matrix Factorization and Collaborative Filtering

CSE 494/598 Lecture-4: Correlation Analysis. **Content adapted from last year s slides

CS425: Algorithms for Web Scale Data

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

CS6220: DATA MINING TECHNIQUES

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Data Mining Techniques

CS-E4830 Kernel Methods in Machine Learning

Nonlinear Dimensionality Reduction

Metric-based classifiers. Nuno Vasconcelos UCSD

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Degree Distribution: The case of Citation Networks

Link Analysis and Web Search

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Graph Helmholtzian and Rank Learning

Andriy Mnih and Ruslan Salakhutdinov

LAPLACIAN MATRIX AND APPLICATIONS

Principal Component Analysis

CS249: ADVANCED DATA MINING

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t

Data Mining Techniques

Data Science Mastery Program

Chapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

CS6220: DATA MINING TECHNIQUES

CSE 546 Final Exam, Autumn 2013

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

Introduction to Social Network Analysis PSU Quantitative Methods Seminar, June 15

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Introduction to Logistic Regression

Online Social Networks and Media. Link Analysis and Web Search

Finding Love in a Hopeless Place Applying Graphical Analysis to Advance the Frontier of Matching Algorithms

Introduction to Data Mining

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

Lecture: Face Recognition

Degree (k)

ECEN 689 Special Topics in Data Science for Communications Networks

CS 231A Section 1: Linear Algebra & Probability Review

Recommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis.

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Internal link prediction: a new approach for predicting links in bipartite graphs

Machine Learning for Data Science (CS4786) Lecture 11

Data Preprocessing Tasks

Using SVD to Recommend Movies

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Administrivia. Blobs and Graphs. Assignment 2. Prof. Noah Snavely CS1114. First part due tomorrow by 5pm Second part due next Friday by 5pm

A Note on Google s PageRank

Lecture 8: Linear Algebra Background

Lecture 12: Link Analysis for Web Retrieval

Machine Learning and Deep Learning! Vincent Lepetit!

Copyright 2000, Kevin Wayne 1

Collaborative Nowcasting for Contextual Recommendation

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 12

Memory-Efficient Low Rank Approximation of Massive Graphs

Online Social Networks and Media. Opinion formation on social networks

Restricted Boltzmann Machines for Collaborative Filtering

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Dimensionality Reduction

Collaborative Topic Modeling for Recommending Scientific Articles

Finite Markov Information-Exchange processes

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Link Analysis Ranking

Why matrices matter. Paul Van Dooren, UCL, CESAME

Generative Models for Discrete Data

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

CS 188: Artificial Intelligence Spring Announcements

Stat 315c: Introduction

Google Page Rank Project Linear Algebra Summer 2012

Leverage Sparse Information in Predictive Modeling

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Rising Algebra 2/Trig Students!

Transcription:

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties Prof. James She james.she@ust.hk 1

Last lecture 2

Selected works from Tutorial #2 From the "Betweenness vs Degree" scatter plot, it seems that in general, the higher the betweenness, the higher the degree. On the other hand, high degree doesn't always mean high betweenness. There exist some nodes that has over 600 total degrees while still having ~0 x 10^5 betweenness. Work from Samuel Chan Work from Tommy Lam 3

Understanding Betweenness 4

Understanding Betweenness 5

Announcements 1. Tutorial #3 tomorrow! 2. More technical / practical programming for data analytics! 3. Make sure you know the Matlab basics and python soon technical challenges/ fun Wk 3 Wk 4 Wk 5 time 6

Summary of this lecture 1. Centrality continued 2. Similarity and Tie Strength of Nodes 3. Introduction to Recommendation 7

Centrality continued 8

Recall: Adjacent Matrix Social graph Adjacent matrix 9

Recall: Betweenness Centrality Intuition: how many pairs of individuals would have to go through you in order to reach someone in the min. # of hops? who has higher betweenness, X or Y? Y X 10

Betweenness Centrality Or, å C ( i) = g ( i) / B j< k jk g jk where! "# = # of geodesics connecting jk;! "# = # that actor i is on. 11

Betweenness Centrality C why do C and D each have betweenness 1? A B E They are both on shortest paths for pairs (A,E), and (B,E), and so must share credit: ½+½ = 1 D 12

Betweenness vs Degree Centrality Data visualization Nodes are sized by degree, and colored by betweenness. Can you spot nodes with high betweenness but relatively low degree? What about high degree but relatively low betweenness? 13

Closeness Centrality What if? the node importance is not simply due to: the number of direct friends (degree centrality) or being in between others (betweenness centrality) But Being close to everyone (closeness centrality)

Closeness Centrality 15

0.4 4 10 4 4 3 2 1 1 ), ( ) ( 1 1 1 1 = ú û ù ê ë é = ú û ù ê ë é + + + = ú ú ú ú û ù ê ê ê ê ë é - = - - - = å n v v g v C n j j i i C A B C E D Closeness Centrality

Closeness Centrality

Closeness vs Degree Centrality Data Visualization Degree denoted by size Closeness denoted by color Nodes with high closeness are located in the middle of the graph

Eigenvector Centrality An aggregated metric to characterize the "global" importance of a node as opposed to "local" e.g., Page Rank (used in Google s early search engine) Node importance due to the centralities of its neighbors 19

Eigenvector Centrality Modified version: PageRank PageRank: used by Google s search engine to list the ranking of web pages in terms of their relative importance or from wikipedia.com The idea: A page, p i, is relatively more important with a higher PR(p i,), when it is linked by many other important pages 20

Eigenvector Centrality Consider the graph with a 5x5 adjacency matrix, A Let x, be a 5x1 centrality vector of nodes (in terms of degree) 21

Eigenvector Centrality Multiply the matrix A by vector x: The resulting value is the sum of the centrality of neighbors 22

Eigenvector Centrality What if the process keep repeating? x is updated repeatedly Eventually reach an equilibrium when the in/out is balanced with neighbors. The final x = {x ", x $, x %, x &, x ' } captures the centrality 23

Eigenvector Centrality Recall linear algebra basics Eigenvectors (for a square m m matrix A) () =!) Example (right) eigenvector ) eigenvalue Have at most m distinct solutions,! "! $! ' Eigenvectors for distinct eigenvalues are orthogonal l 1 ¹ l2 Þ v1 v2 = 0 24

Eigenvector Centrality Find the solution Eigen value decomposition! diagonal Columns of U are eigenvectors of Diagonal elements of are eigenvalues of!! the effect of largest eigenvalues is largest 25

Eigenvector Centrality Eigenvector corresponding to the largest eigenvalue! = 0 1 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 1 0 1 0 0 1 1 0 Ax=&x e.g., Matlab: [vector, value] = eig(a) 0.180 0.475 0.537 0.537 0.407 Node C and D Eigenvalue decomposition 26

Importance of Nodes In summary 7 Closeness Centrality: They have shortest path to other nodes 6 5 Degree Centrality: It has most direct connections 1 8 9 10 Eigenvector Centrality: Its neighbors are the most important 3 4 2 Betweenness Centrality: It connects the two nodes to all other nodes 27

10 min break 29

Strength of Ties and Similarity Measurement Description Applications Similarity Indicates the similarity between nodes by their common attributes (e.g., contacts or interest) Determine the tie strength, community of nodes, recommendations, etc. Tie Strength Indicates the strength of link between nodes. (e.g. frequency of interactions and duration of encounter) Determine if a link weak/strong connection, other hidden and missing info. 30

Similarity Jaccard similarity!(# $, # & ) = ) * ), ) * ), Used to quantify the similarity between 2 sets # $ and # &. 0 J(U i,u j ) 1 1: the 2 sets are identical, 0: the 2 sets have no common elements. 31

Recall User Profiles in Social Media 32

Learning from Profiles Tie strength based on attributes Descriptive attributes of nodes (e.g., interests, common friends, etc.) Consider nodes similarity based on these attribute e.g., User A and B have more common interests Strong tie (high similarity) user A Reading Film Painting Swimming Weak tie (low similarity) Reading Film Singing user B user C TV Game Hiking 33

Learning from Interactions Tie strength based on contacts Structural features, # of common friends Jaccard similarity of the friend sets More common friends è higher similarity User A and B have more common friends Strong tie (high similarity) Friend of: D, E, F User B User A User C Friend of: D, E, F Weak tie (low similarity) Friend of: D, G, H 34

Similarity Jaccard similarity example favorite interests, User A U A = {Reading, Film, Painting, Swimming} User B U B = {Reading, Film, Painting, Singing} User C U C = {Reading, TV games} J U A, U % = {Reading, Painting, Film} {Reading, Film, Painting, Swimming, Singing} # of interests in common Total # of interest that two people have = 3 5 J U A, U, = {Reading} {Reading, Film, Painting, Swimming, TV games} = 1 5 35

Similarity Jaccard similarity special issues J U A, U % = {Reading, Film} {Reading, Film, Painting, Swimming, Singing} = 2 5 J U A, U % = {Reading, Film} {Reading, Film, Painting, Swimming, Singing, TV games} = 2 6 = 1 3 2 possible choices of denominator: 1) union of the 2 users interests; 2) or all possible interests? PS: if the former one is used, some information may be lost 36

Similarity Cosine similarity "($ %, $ ' ) = To quantify the similarity between two sets. 0 "($ %, $ ' ) 1: 1: the 2 sets are identical, 0: the 2 sets have no common elements. 37

Similarity Cosine similarity example Restaurant visit freq.: User A U A = {LG1:3, Café:5, McDonalds:5} User B U B = {LG1:2, Café:6, McDonalds:4} User C U C = {LG1:10, Café:1, McDonalds:0}! " #, " % = 3 2 + 5 6 + 5 4 3. + 5. + 5. {2. + 6. + 4. } = C " #, " 5 = 56 59 56 = 0.97 3 10 + 5 1 + 5 0 3. + 5. + 5. {10. + 1. + 0. } = 35 59 101 = 0.45 38

Ties Strength Weak or Strong? Now, connections (links) are not the same strength Interpersonal social networks in real-life: Strong ties (close friend) Weak ties (acquaintances) Community formation and information diffusion Strength of Weak Ties (Granovetter, 1973) Occasional encounters with distant acquaintance provides new opportunity in jobs search 39

Weak and Strong Ties 40

How does strength of a tie influence diffusion? M. S. Granovetter: The Strength of Weak Ties, AJS, 1973: Finding a job through a contact which see Frequently (2+/week) 16.7% Occasionally (more than once a year but < 2/week) 55.6% Rarely 27.8% But length of path is short a person directly works for/is the employer or connected directly to employer PS: Any real life experience? 41

Zachary s Karate Club Dataset Zachary s Karate Club is a dataset that describes the social relationships by Wayne W. Zachary in his paper An Information Flow Model for Conflict and Fission in Small Groups 34 nodes representing the member of the club 77 edges representing the friendship between the members src: http://networkdata.ics.uci.edu/netdata/html/zacharykarate.html 42

Out-class activity 3 (due before Tutorial #3) Read the paper An Information Flow Model for Conflict and Fission in Small Groups (3400+ citations in 2018) http://course.ece.ust.hk/elec6910q/referencepaper/an_information_flow_model_for_ Conflict_and_Fission.pdf 1. 3 points about their contributions 2. 3 possible extensions with we learnt from the course 3. Submit by Facebook post before tomorrow noon 43

End of Lecture ( ) Questions / Comments? 44

Recommendation 45

Types of Recommendations 1. Image (Flickr) 2. Video (YouTube, Youku, Netflix) 3. Cuisine (Openrice, Dianping) 4. Friend/Member/Articles (Facebook, Renren, WeChat, Line, etc.) 5. Webpage/ bookmarks (Delicious) 6. Product (ebay, Amazon) 46

Recommendation Inputs 1. When users interest/preference are specified by users, recommend by criteria. 2. Recommend through social data, history, behavioral data with machine learning and data mining techniques. Netflix recommendation system example https://www.youtube.com/watch?v=nq2qtatuf7u 47

Common Techniques in Social Networks 1. Collaborative filtering (CF) Understand user properties for recommendation e.g., tagging for user generated content 2. K-NN based recommendation Understand the item and user properties for recommendation Similarities among items and users are calculated 48

Collaborative Filtering 49

Collaborative Filtering (CF) 1. The most prominent approach used by large, commercial e-commerce sites well-understood, various algorithms and variations exist applicable in many domains (books, movies...) 2. Basic assumption and idea customers tastes does not change much with time 50

51 Collaborative Filtering (CF) Leveraging similarity abcd How it works 1. Should item 1 be recommended to Tim from the user-item matrix? Item to Item 2. 2 approaches: user to user (calculate user similarity ) item to item (calculate item similarity) User to User

Collaborative Filtering (CF) User-to-user Finding similar users (also similar tastes) e.g., Jaccard similarity abcd Jane and Tim both liked item 2 and disliked item 3 they have similar tastes!(# $, # & ) = # $ # & # $ # & Item 1 is recommended to Tim (item 1 is liked by Jane) 52

Collaborative Filtering (CF) User-to-user User-based Nearest Neighbor Neighbor = similar users Generate a prediction for an item i by analyzing ratings for i from users in u s neighborhood pred( u, i) = r u + å vìneighbors( u) å vìneighbors( u) sim( u, v) ( r vi sim( u, v) - r v ) 53

54 Collaborative Filtering (CF) Item-to-item Finding items that have similar subscribers Dom and Sandra are 2 users both like item 1 & 4 Users like item 4, also like item 1 at the same time item 1 will be recommended to Tim.

Collaborative Filtering (CF) Item-to-item Item-Based Nearest Neighbor Generate predictions based on similarities between items. Prediction for a user u and item i is composed of a weighted sum of the user u s ratings for items most similar to i. pred ( u, i) = å jîrateditems ( u) å jîrateditems ( u) sim( i, j) r sim( i, j) ui 55

Example: Friendship Recommendation 1. Similarity among users can be found through useritem matrix 2. Recommend Don to Jane (as online friend), since they have most similar tastes (common interests) Jane Tim Don 56

K-nearest neighbors (K-NN) 57

Recommendation is Classification Problems 1. 2 classes: like or dislike 2. Recommendation: find items that will be liked 3. Example: which clothes will be liked? 58

K-nearest neighbors (K-NN) 1. The simplest machine learning algorithm for classification 2. Assign an object to the class most common among its k nearest neighbors after some voting mechanism; 3. Different neighbors could have different weights e.g., the nearest one has a higher weight (by similarity) 59

K-NN: classifying a fish 2 classes: sea bass and salmon k = 3, (2 sea bass, 1 salmon) Classified as sea bass 3 classes: sea bass, salmon and eel k = 5, (3 sea bass, 1 eel, 1 salmon) Classified as sea bass 60

K-NN: An algorithm to find users/object with similar tastes/subscribers Step 2: Find the k nearest neighbors, the k items with highest similarity like? disliked disliked Step 1: Collect a set of labeled samples, liked liked liked liked liked Tim liked disliked If K=3, then in this case query instance will be classified as positive since 2 nearest neighbors are positive Step 3: Classify the input item. e.g., like or dislike the item 61