ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Size: px
Start display at page:

Download "ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties"

Transcription

1 ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties Prof. James She james.she@ust.hk 1

2 Last lecture 2

3 Selected works from Tutorial #2 From the "Betweenness vs Degree" scatter plot, it seems that in general, the higher the betweenness, the higher the degree. On the other hand, high degree doesn't always mean high betweenness. There exist some nodes that has over 600 total degrees while still having ~0 x 10^5 betweenness. Work from Samuel Chan Work from Tommy Lam 3

4 Understanding Betweenness 4

5 Understanding Betweenness 5

6 Announcements 1. Tutorial #3 tomorrow! 2. More technical / practical programming for data analytics! 3. Make sure you know the Matlab basics and python soon technical challenges/ fun Wk 3 Wk 4 Wk 5 time 6

7 Summary of this lecture 1. Centrality continued 2. Similarity and Tie Strength of Nodes 3. Introduction to Recommendation 7

8 Centrality continued 8

9 Recall: Adjacent Matrix Social graph Adjacent matrix 9

10 Recall: Betweenness Centrality Intuition: how many pairs of individuals would have to go through you in order to reach someone in the min. # of hops? who has higher betweenness, X or Y? Y X 10

11 Betweenness Centrality Or, å C ( i) = g ( i) / B j< k jk g jk where! "# = # of geodesics connecting jk;! "# = # that actor i is on. 11

12 Betweenness Centrality C why do C and D each have betweenness 1? A B E They are both on shortest paths for pairs (A,E), and (B,E), and so must share credit: ½+½ = 1 D 12

13 Betweenness vs Degree Centrality Data visualization Nodes are sized by degree, and colored by betweenness. Can you spot nodes with high betweenness but relatively low degree? What about high degree but relatively low betweenness? 13

14 Closeness Centrality What if? the node importance is not simply due to: the number of direct friends (degree centrality) or being in between others (betweenness centrality) But Being close to everyone (closeness centrality)

15 Closeness Centrality 15

16 ), ( ) ( = ú û ù ê ë é = ú û ù ê ë é = ú ú ú ú û ù ê ê ê ê ë é - = = å n v v g v C n j j i i C A B C E D Closeness Centrality

17 Closeness Centrality

18 Closeness vs Degree Centrality Data Visualization Degree denoted by size Closeness denoted by color Nodes with high closeness are located in the middle of the graph

19 Eigenvector Centrality An aggregated metric to characterize the "global" importance of a node as opposed to "local" e.g., Page Rank (used in Google s early search engine) Node importance due to the centralities of its neighbors 19

20 Eigenvector Centrality Modified version: PageRank PageRank: used by Google s search engine to list the ranking of web pages in terms of their relative importance or from wikipedia.com The idea: A page, p i, is relatively more important with a higher PR(p i,), when it is linked by many other important pages 20

21 Eigenvector Centrality Consider the graph with a 5x5 adjacency matrix, A Let x, be a 5x1 centrality vector of nodes (in terms of degree) 21

22 Eigenvector Centrality Multiply the matrix A by vector x: The resulting value is the sum of the centrality of neighbors 22

23 Eigenvector Centrality What if the process keep repeating? x is updated repeatedly Eventually reach an equilibrium when the in/out is balanced with neighbors. The final x = {x ", x $, x %, x &, x ' } captures the centrality 23

24 Eigenvector Centrality Recall linear algebra basics Eigenvectors (for a square m m matrix A) () =!) Example (right) eigenvector ) eigenvalue Have at most m distinct solutions,! "! $! ' Eigenvectors for distinct eigenvalues are orthogonal l 1 ¹ l2 Þ v1 v2 = 0 24

25 Eigenvector Centrality Find the solution Eigen value decomposition! diagonal Columns of U are eigenvectors of Diagonal elements of are eigenvalues of!! the effect of largest eigenvalues is largest 25

26 Eigenvector Centrality Eigenvector corresponding to the largest eigenvalue! = Ax=&x e.g., Matlab: [vector, value] = eig(a) Node C and D Eigenvalue decomposition 26

27 Importance of Nodes In summary 7 Closeness Centrality: They have shortest path to other nodes 6 5 Degree Centrality: It has most direct connections Eigenvector Centrality: Its neighbors are the most important Betweenness Centrality: It connects the two nodes to all other nodes 27

28 10 min break 29

29 Strength of Ties and Similarity Measurement Description Applications Similarity Indicates the similarity between nodes by their common attributes (e.g., contacts or interest) Determine the tie strength, community of nodes, recommendations, etc. Tie Strength Indicates the strength of link between nodes. (e.g. frequency of interactions and duration of encounter) Determine if a link weak/strong connection, other hidden and missing info. 30

30 Similarity Jaccard similarity!(# $, # & ) = ) * ), ) * ), Used to quantify the similarity between 2 sets # $ and # &. 0 J(U i,u j ) 1 1: the 2 sets are identical, 0: the 2 sets have no common elements. 31

31 Recall User Profiles in Social Media 32

32 Learning from Profiles Tie strength based on attributes Descriptive attributes of nodes (e.g., interests, common friends, etc.) Consider nodes similarity based on these attribute e.g., User A and B have more common interests Strong tie (high similarity) user A Reading Film Painting Swimming Weak tie (low similarity) Reading Film Singing user B user C TV Game Hiking 33

33 Learning from Interactions Tie strength based on contacts Structural features, # of common friends Jaccard similarity of the friend sets More common friends è higher similarity User A and B have more common friends Strong tie (high similarity) Friend of: D, E, F User B User A User C Friend of: D, E, F Weak tie (low similarity) Friend of: D, G, H 34

34 Similarity Jaccard similarity example favorite interests, User A U A = {Reading, Film, Painting, Swimming} User B U B = {Reading, Film, Painting, Singing} User C U C = {Reading, TV games} J U A, U % = {Reading, Painting, Film} {Reading, Film, Painting, Swimming, Singing} # of interests in common Total # of interest that two people have = 3 5 J U A, U, = {Reading} {Reading, Film, Painting, Swimming, TV games} =

35 Similarity Jaccard similarity special issues J U A, U % = {Reading, Film} {Reading, Film, Painting, Swimming, Singing} = 2 5 J U A, U % = {Reading, Film} {Reading, Film, Painting, Swimming, Singing, TV games} = 2 6 = possible choices of denominator: 1) union of the 2 users interests; 2) or all possible interests? PS: if the former one is used, some information may be lost 36

36 Similarity Cosine similarity "($ %, $ ' ) = To quantify the similarity between two sets. 0 "($ %, $ ' ) 1: 1: the 2 sets are identical, 0: the 2 sets have no common elements. 37

37 Similarity Cosine similarity example Restaurant visit freq.: User A U A = {LG1:3, Café:5, McDonalds:5} User B U B = {LG1:2, Café:6, McDonalds:4} User C U C = {LG1:10, Café:1, McDonalds:0}! " #, " % = { } = C " #, " 5 = = { } = =

38 Ties Strength Weak or Strong? Now, connections (links) are not the same strength Interpersonal social networks in real-life: Strong ties (close friend) Weak ties (acquaintances) Community formation and information diffusion Strength of Weak Ties (Granovetter, 1973) Occasional encounters with distant acquaintance provides new opportunity in jobs search 39

39 Weak and Strong Ties 40

40 How does strength of a tie influence diffusion? M. S. Granovetter: The Strength of Weak Ties, AJS, 1973: Finding a job through a contact which see Frequently (2+/week) 16.7% Occasionally (more than once a year but < 2/week) 55.6% Rarely 27.8% But length of path is short a person directly works for/is the employer or connected directly to employer PS: Any real life experience? 41

41 Zachary s Karate Club Dataset Zachary s Karate Club is a dataset that describes the social relationships by Wayne W. Zachary in his paper An Information Flow Model for Conflict and Fission in Small Groups 34 nodes representing the member of the club 77 edges representing the friendship between the members src: 42

42 Out-class activity 3 (due before Tutorial #3) Read the paper An Information Flow Model for Conflict and Fission in Small Groups (3400+ citations in 2018) Conflict_and_Fission.pdf 1. 3 points about their contributions 2. 3 possible extensions with we learnt from the course 3. Submit by Facebook post before tomorrow noon 43

43 End of Lecture ( ) Questions / Comments? 44

44 Recommendation 45

45 Types of Recommendations 1. Image (Flickr) 2. Video (YouTube, Youku, Netflix) 3. Cuisine (Openrice, Dianping) 4. Friend/Member/Articles (Facebook, Renren, WeChat, Line, etc.) 5. Webpage/ bookmarks (Delicious) 6. Product (ebay, Amazon) 46

46 Recommendation Inputs 1. When users interest/preference are specified by users, recommend by criteria. 2. Recommend through social data, history, behavioral data with machine learning and data mining techniques. Netflix recommendation system example 47

47 Common Techniques in Social Networks 1. Collaborative filtering (CF) Understand user properties for recommendation e.g., tagging for user generated content 2. K-NN based recommendation Understand the item and user properties for recommendation Similarities among items and users are calculated 48

48 Collaborative Filtering 49

49 Collaborative Filtering (CF) 1. The most prominent approach used by large, commercial e-commerce sites well-understood, various algorithms and variations exist applicable in many domains (books, movies...) 2. Basic assumption and idea customers tastes does not change much with time 50

50 51 Collaborative Filtering (CF) Leveraging similarity abcd How it works 1. Should item 1 be recommended to Tim from the user-item matrix? Item to Item 2. 2 approaches: user to user (calculate user similarity ) item to item (calculate item similarity) User to User

51 Collaborative Filtering (CF) User-to-user Finding similar users (also similar tastes) e.g., Jaccard similarity abcd Jane and Tim both liked item 2 and disliked item 3 they have similar tastes!(# $, # & ) = # $ # & # $ # & Item 1 is recommended to Tim (item 1 is liked by Jane) 52

52 Collaborative Filtering (CF) User-to-user User-based Nearest Neighbor Neighbor = similar users Generate a prediction for an item i by analyzing ratings for i from users in u s neighborhood pred( u, i) = r u + å vìneighbors( u) å vìneighbors( u) sim( u, v) ( r vi sim( u, v) - r v ) 53

53 54 Collaborative Filtering (CF) Item-to-item Finding items that have similar subscribers Dom and Sandra are 2 users both like item 1 & 4 Users like item 4, also like item 1 at the same time item 1 will be recommended to Tim.

54 Collaborative Filtering (CF) Item-to-item Item-Based Nearest Neighbor Generate predictions based on similarities between items. Prediction for a user u and item i is composed of a weighted sum of the user u s ratings for items most similar to i. pred ( u, i) = å jîrateditems ( u) å jîrateditems ( u) sim( i, j) r sim( i, j) ui 55

55 Example: Friendship Recommendation 1. Similarity among users can be found through useritem matrix 2. Recommend Don to Jane (as online friend), since they have most similar tastes (common interests) Jane Tim Don 56

56 K-nearest neighbors (K-NN) 57

57 Recommendation is Classification Problems 1. 2 classes: like or dislike 2. Recommendation: find items that will be liked 3. Example: which clothes will be liked? 58

58 K-nearest neighbors (K-NN) 1. The simplest machine learning algorithm for classification 2. Assign an object to the class most common among its k nearest neighbors after some voting mechanism; 3. Different neighbors could have different weights e.g., the nearest one has a higher weight (by similarity) 59

59 K-NN: classifying a fish 2 classes: sea bass and salmon k = 3, (2 sea bass, 1 salmon) Classified as sea bass 3 classes: sea bass, salmon and eel k = 5, (3 sea bass, 1 eel, 1 salmon) Classified as sea bass 60

60 K-NN: An algorithm to find users/object with similar tastes/subscribers Step 2: Find the k nearest neighbors, the k items with highest similarity like? disliked disliked Step 1: Collect a set of labeled samples, liked liked liked liked liked Tim liked disliked If K=3, then in this case query instance will be classified as positive since 2 nearest neighbors are positive Step 3: Classify the input item. e.g., like or dislike the item 61

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Kristina Lerman USC Information Sciences Institute

Kristina Lerman USC Information Sciences Institute Rethinking Network Structure Kristina Lerman USC Information Sciences Institute Università della Svizzera Italiana, December 16, 2011 Measuring network structure Central nodes Community structure Strength

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

Collaborative Filtering

Collaborative Filtering Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian Collaborative Filtering Combining information among collaborating entities to make

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Complex Social System, Elections. Introduction to Network Analysis 1

Complex Social System, Elections. Introduction to Network Analysis 1 Complex Social System, Elections Introduction to Network Analysis 1 Complex Social System, Network I person A voted for B A is more central than B if more people voted for A In-degree centrality index

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

Class President: A Network Approach to Popularity. Due July 18, 2014

Class President: A Network Approach to Popularity. Due July 18, 2014 Class President: A Network Approach to Popularity Due July 8, 24 Instructions. Due Fri, July 8 at :59 PM 2. Work in groups of up to 3 3. Type up the report, and submit as a pdf on D2L 4. Attach the code

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Introduction to Link Prediction

Introduction to Link Prediction Introduction to Link Prediction Machine Learning and Modelling for Social Networks Lloyd Sanders, Olivia Woolley, Iza Moize, Nino Antulov-Fantulin D-GESS: Computational Social Science COSS Overview What

More information

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent

More information

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search 6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report

More information

Matrix Factorization and Collaborative Filtering

Matrix Factorization and Collaborative Filtering 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Matrix Factorization and Collaborative Filtering MF Readings: (Koren et al., 2009)

More information

CSE 494/598 Lecture-4: Correlation Analysis. **Content adapted from last year s slides

CSE 494/598 Lecture-4: Correlation Analysis. **Content adapted from last year s slides CSE 494/598 Lecture-4: Correlation Analysis LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **Content adapted from last year s slides Announcements Project-1 Due: February 12 th 2016 Analysis report:

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS. The original slides can be accessed at: www.mmds.org Customer

More information

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use! Data Mining The art of extracting knowledge from large bodies of structured data. Let s put it to use! 1 Recommendations 2 Basic Recommendations with Collaborative Filtering Making Recommendations 4 The

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

Degree Distribution: The case of Citation Networks

Degree Distribution: The case of Citation Networks Network Analysis Degree Distribution: The case of Citation Networks Papers (in almost all fields) refer to works done earlier on same/related topics Citations A network can be defined as Each node is

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Episode 11 Baochun Li Professor Department of Electrical and Computer Engineering University of Toronto Link Analysis and Web Search (Chapter 13, 14) Information networks and

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research

More information

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US

More information

Graph Helmholtzian and Rank Learning

Graph Helmholtzian and Rank Learning Graph Helmholtzian and Rank Learning Lek-Heng Lim NIPS Workshop on Algebraic Methods in Machine Learning December 2, 2008 (Joint work with Xiaoye Jiang, Yuan Yao, and Yinyu Ye) L.-H. Lim (NIPS 2008) Graph

More information

Andriy Mnih and Ruslan Salakhutdinov

Andriy Mnih and Ruslan Salakhutdinov MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative

More information

LAPLACIAN MATRIX AND APPLICATIONS

LAPLACIAN MATRIX AND APPLICATIONS LAPLACIAN MATRIX AND APPLICATIONS Alice Nanyanzi Supervisors: Dr. Franck Kalala Mutombo & Dr. Simukai Utete alicenanyanzi@aims.ac.za August 24, 2017 1 Complex systems & Complex Networks 2 Networks Overview

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t Markov Chains. Statistical Problem. We may have an underlying evolving system (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t Consecutive speech feature vectors are related

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Data Science Mastery Program

Data Science Mastery Program Data Science Mastery Program Copyright Policy All content included on the Site or third-party platforms as part of the class, such as text, graphics, logos, button icons, images, audio clips, video clips,

More information

Chapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) In class, we saw this graph, with each node representing people who are following each other on Twitter: Our

More information

Introduction to Social Network Analysis PSU Quantitative Methods Seminar, June 15

Introduction to Social Network Analysis PSU Quantitative Methods Seminar, June 15 Introduction to Social Network Analysis PSU Quantitative Methods Seminar, June 15 Jeffrey A. Smith University of Nebraska-Lincoln Department of Sociology Course Website https://sites.google.com/site/socjasmith/teaching2/psu_social_networks_seminar

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE Cost and Preference in Recommender Systems Junhua Chen, Big Data Research Center, UESTC Email:junmshao@uestc.edu.cn http://staff.uestc.edu.cn/shaojunming Abstract In many recommender systems (RS), user

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Finding Love in a Hopeless Place Applying Graphical Analysis to Advance the Frontier of Matching Algorithms

Finding Love in a Hopeless Place Applying Graphical Analysis to Advance the Frontier of Matching Algorithms Finding Love in a Hopeless Place Applying Graphical Analysis to Advance the Frontier of Matching Algorithms Seth Hildick-Smith MS Stanford 2017, Computer Science sethjhs@stanford.edu Jonathan NeCamp MS

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002 CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Degree (k)

Degree (k) 0 1 Pr(X k) 0 0 1 Degree (k) Figure A1: Log-log plot of the complementary cumulative distribution function (CCDF) of the degree distribution for a sample month (January 0) network is shown (blue), along

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Recommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis.

Recommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis. Recommender December 13, 2016 What Are Recommender Systems? What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: What Are Recommender Systems? Various forms,

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Internal link prediction: a new approach for predicting links in bipartite graphs

Internal link prediction: a new approach for predicting links in bipartite graphs Internal link prediction: a new approach for predicting links in bipartite graphs Oussama llali, lémence Magnien and Matthieu Latapy LIP6 NRS and Université Pierre et Marie urie (UPM Paris 6) 4 place Jussieu

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and

More information

Administrivia. Blobs and Graphs. Assignment 2. Prof. Noah Snavely CS1114. First part due tomorrow by 5pm Second part due next Friday by 5pm

Administrivia. Blobs and Graphs. Assignment 2. Prof. Noah Snavely CS1114. First part due tomorrow by 5pm Second part due next Friday by 5pm Blobs and Graphs Prof. Noah Snavely CS1114 http://www.cs.cornell.edu/courses/cs1114 Administrivia Assignment 2 First part due tomorrow by 5pm Second part due next Friday by 5pm 2 Prelims Prelim 1: March

More information

A Note on Google s PageRank

A Note on Google s PageRank A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Lecture 12: Link Analysis for Web Retrieval

Lecture 12: Link Analysis for Web Retrieval Lecture 12: Link Analysis for Web Retrieval Trevor Cohn COMP90042, 2015, Semester 1 What we ll learn in this lecture The web as a graph Page-rank method for deriving the importance of pages Hubs and authorities

More information

Machine Learning and Deep Learning! Vincent Lepetit!

Machine Learning and Deep Learning! Vincent Lepetit! Machine Learning and Deep Learning!! Vincent Lepetit! 1! What is Machine Learning?! 2! Hand-Written Digit Recognition! 2 9 3! Hand-Written Digit Recognition! Formalization! 0 1 x = @ A Images are 28x28

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Divide-and-Conquer Chapter 5 Divide and Conquer Divide-and-conquer. Break up problem into several parts. Solve each part recursively. Combine solutions to sub-problems into overall solution. Most common

More information

Collaborative Nowcasting for Contextual Recommendation

Collaborative Nowcasting for Contextual Recommendation Collaborative for Contextual Recommendation Yu Sun 1, Nicholas Jing Yuan 2, Xing Xie 3, Kieran McDonald 4, Rui Zhang 5 University of Melbourne { 1 sun.y, 5 rui.zhang}@unimelb.edu.au Microsoft Research

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 8: Evaluation & SVD Paul Ginsparg Cornell University, Ithaca, NY 20 Sep 2011

More information

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A

More information

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering CS 175: Project in Artificial Intelligence Slides 4: Collaborative Filtering 1 Topic 6: Collaborative Filtering Some slides taken from Prof. Smyth (with slight modifications) 2 Outline General aspects

More information

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 12

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 12 EECS 6A Designing Information Devices and Systems I Spring 06 Elad Alon, Babak Ayazifar Homework This homework is due April 6, 06, at Noon Homework process and study group Who else did you work with on

More information

Memory-Efficient Low Rank Approximation of Massive Graphs

Memory-Efficient Low Rank Approximation of Massive Graphs Fast and Memory-Efficient Low Rank Approximation of Massive Graphs Inderjit S. Dhillon University of Texas at Austin Purdue University Jan 31, 2012 Joint work with Berkant Savas, Donghyuk Shin, Si Si Han

More information

Online Social Networks and Media. Opinion formation on social networks

Online Social Networks and Media. Opinion formation on social networks Online Social Networks and Media Opinion formation on social networks Diffusion of items So far we have assumed that what is being diffused in the network is some discrete item: E.g., a virus, a product,

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks 6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks Daron Acemoglu and Asu Ozdaglar MIT November 4, 2009 1 Introduction Outline The role of networks in cooperation A model of social norms

More information

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University 2018 EE448, Big Data Mining, Lecture 10 Recommender Systems Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of This Course Overview of

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Topic Modeling for Recommending Scientific Articles Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao

More information

Finite Markov Information-Exchange processes

Finite Markov Information-Exchange processes Finite Markov Information-Exchange processes David Aldous February 2, 2011 Course web site: Google Aldous STAT 260. Style of course Big Picture thousands of papers from different disciplines (statistical

More information

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Why matrices matter. Paul Van Dooren, UCL, CESAME

Why matrices matter. Paul Van Dooren, UCL, CESAME Why matrices matter Paul Van Dooren, UCL, CESAME Where are matrices coming from? ma trix (mā'trĭks) n., pl., ma tri ces (mā'trĭ-sēz') Anatomy. The womb (uterus).... Geology. The solid matter in which a

More information

Generative Models for Discrete Data

Generative Models for Discrete Data Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

More information

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES MICHELE BENZI AND CHRISTINE KLYMKO Abstract This document contains details of numerical

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore

More information

Stat 315c: Introduction

Stat 315c: Introduction Stat 315c: Introduction Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Stat 315c: Introduction 1 / 14 Stat 315c Analysis of Transposable Data Usual Statistics Setup there s Y (we ll

More information

Google Page Rank Project Linear Algebra Summer 2012

Google Page Rank Project Linear Algebra Summer 2012 Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant

More information

Leverage Sparse Information in Predictive Modeling

Leverage Sparse Information in Predictive Modeling Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart

More information

Rising Algebra 2/Trig Students!

Rising Algebra 2/Trig Students! Rising Algebra 2/Trig Students! As a 7 th grader entering in to Algebra 2/Trig next year, it is very important that you have mastered the topics listed below. The majority of the topics were taught in

More information