ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties
|
|
- Christiana French
- 5 years ago
- Views:
Transcription
1 ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties Prof. James She james.she@ust.hk 1
2 Last lecture 2
3 Selected works from Tutorial #2 From the "Betweenness vs Degree" scatter plot, it seems that in general, the higher the betweenness, the higher the degree. On the other hand, high degree doesn't always mean high betweenness. There exist some nodes that has over 600 total degrees while still having ~0 x 10^5 betweenness. Work from Samuel Chan Work from Tommy Lam 3
4 Understanding Betweenness 4
5 Understanding Betweenness 5
6 Announcements 1. Tutorial #3 tomorrow! 2. More technical / practical programming for data analytics! 3. Make sure you know the Matlab basics and python soon technical challenges/ fun Wk 3 Wk 4 Wk 5 time 6
7 Summary of this lecture 1. Centrality continued 2. Similarity and Tie Strength of Nodes 3. Introduction to Recommendation 7
8 Centrality continued 8
9 Recall: Adjacent Matrix Social graph Adjacent matrix 9
10 Recall: Betweenness Centrality Intuition: how many pairs of individuals would have to go through you in order to reach someone in the min. # of hops? who has higher betweenness, X or Y? Y X 10
11 Betweenness Centrality Or, å C ( i) = g ( i) / B j< k jk g jk where! "# = # of geodesics connecting jk;! "# = # that actor i is on. 11
12 Betweenness Centrality C why do C and D each have betweenness 1? A B E They are both on shortest paths for pairs (A,E), and (B,E), and so must share credit: ½+½ = 1 D 12
13 Betweenness vs Degree Centrality Data visualization Nodes are sized by degree, and colored by betweenness. Can you spot nodes with high betweenness but relatively low degree? What about high degree but relatively low betweenness? 13
14 Closeness Centrality What if? the node importance is not simply due to: the number of direct friends (degree centrality) or being in between others (betweenness centrality) But Being close to everyone (closeness centrality)
15 Closeness Centrality 15
16 ), ( ) ( = ú û ù ê ë é = ú û ù ê ë é = ú ú ú ú û ù ê ê ê ê ë é - = = å n v v g v C n j j i i C A B C E D Closeness Centrality
17 Closeness Centrality
18 Closeness vs Degree Centrality Data Visualization Degree denoted by size Closeness denoted by color Nodes with high closeness are located in the middle of the graph
19 Eigenvector Centrality An aggregated metric to characterize the "global" importance of a node as opposed to "local" e.g., Page Rank (used in Google s early search engine) Node importance due to the centralities of its neighbors 19
20 Eigenvector Centrality Modified version: PageRank PageRank: used by Google s search engine to list the ranking of web pages in terms of their relative importance or from wikipedia.com The idea: A page, p i, is relatively more important with a higher PR(p i,), when it is linked by many other important pages 20
21 Eigenvector Centrality Consider the graph with a 5x5 adjacency matrix, A Let x, be a 5x1 centrality vector of nodes (in terms of degree) 21
22 Eigenvector Centrality Multiply the matrix A by vector x: The resulting value is the sum of the centrality of neighbors 22
23 Eigenvector Centrality What if the process keep repeating? x is updated repeatedly Eventually reach an equilibrium when the in/out is balanced with neighbors. The final x = {x ", x $, x %, x &, x ' } captures the centrality 23
24 Eigenvector Centrality Recall linear algebra basics Eigenvectors (for a square m m matrix A) () =!) Example (right) eigenvector ) eigenvalue Have at most m distinct solutions,! "! $! ' Eigenvectors for distinct eigenvalues are orthogonal l 1 ¹ l2 Þ v1 v2 = 0 24
25 Eigenvector Centrality Find the solution Eigen value decomposition! diagonal Columns of U are eigenvectors of Diagonal elements of are eigenvalues of!! the effect of largest eigenvalues is largest 25
26 Eigenvector Centrality Eigenvector corresponding to the largest eigenvalue! = Ax=&x e.g., Matlab: [vector, value] = eig(a) Node C and D Eigenvalue decomposition 26
27 Importance of Nodes In summary 7 Closeness Centrality: They have shortest path to other nodes 6 5 Degree Centrality: It has most direct connections Eigenvector Centrality: Its neighbors are the most important Betweenness Centrality: It connects the two nodes to all other nodes 27
28 10 min break 29
29 Strength of Ties and Similarity Measurement Description Applications Similarity Indicates the similarity between nodes by their common attributes (e.g., contacts or interest) Determine the tie strength, community of nodes, recommendations, etc. Tie Strength Indicates the strength of link between nodes. (e.g. frequency of interactions and duration of encounter) Determine if a link weak/strong connection, other hidden and missing info. 30
30 Similarity Jaccard similarity!(# $, # & ) = ) * ), ) * ), Used to quantify the similarity between 2 sets # $ and # &. 0 J(U i,u j ) 1 1: the 2 sets are identical, 0: the 2 sets have no common elements. 31
31 Recall User Profiles in Social Media 32
32 Learning from Profiles Tie strength based on attributes Descriptive attributes of nodes (e.g., interests, common friends, etc.) Consider nodes similarity based on these attribute e.g., User A and B have more common interests Strong tie (high similarity) user A Reading Film Painting Swimming Weak tie (low similarity) Reading Film Singing user B user C TV Game Hiking 33
33 Learning from Interactions Tie strength based on contacts Structural features, # of common friends Jaccard similarity of the friend sets More common friends è higher similarity User A and B have more common friends Strong tie (high similarity) Friend of: D, E, F User B User A User C Friend of: D, E, F Weak tie (low similarity) Friend of: D, G, H 34
34 Similarity Jaccard similarity example favorite interests, User A U A = {Reading, Film, Painting, Swimming} User B U B = {Reading, Film, Painting, Singing} User C U C = {Reading, TV games} J U A, U % = {Reading, Painting, Film} {Reading, Film, Painting, Swimming, Singing} # of interests in common Total # of interest that two people have = 3 5 J U A, U, = {Reading} {Reading, Film, Painting, Swimming, TV games} =
35 Similarity Jaccard similarity special issues J U A, U % = {Reading, Film} {Reading, Film, Painting, Swimming, Singing} = 2 5 J U A, U % = {Reading, Film} {Reading, Film, Painting, Swimming, Singing, TV games} = 2 6 = possible choices of denominator: 1) union of the 2 users interests; 2) or all possible interests? PS: if the former one is used, some information may be lost 36
36 Similarity Cosine similarity "($ %, $ ' ) = To quantify the similarity between two sets. 0 "($ %, $ ' ) 1: 1: the 2 sets are identical, 0: the 2 sets have no common elements. 37
37 Similarity Cosine similarity example Restaurant visit freq.: User A U A = {LG1:3, Café:5, McDonalds:5} User B U B = {LG1:2, Café:6, McDonalds:4} User C U C = {LG1:10, Café:1, McDonalds:0}! " #, " % = { } = C " #, " 5 = = { } = =
38 Ties Strength Weak or Strong? Now, connections (links) are not the same strength Interpersonal social networks in real-life: Strong ties (close friend) Weak ties (acquaintances) Community formation and information diffusion Strength of Weak Ties (Granovetter, 1973) Occasional encounters with distant acquaintance provides new opportunity in jobs search 39
39 Weak and Strong Ties 40
40 How does strength of a tie influence diffusion? M. S. Granovetter: The Strength of Weak Ties, AJS, 1973: Finding a job through a contact which see Frequently (2+/week) 16.7% Occasionally (more than once a year but < 2/week) 55.6% Rarely 27.8% But length of path is short a person directly works for/is the employer or connected directly to employer PS: Any real life experience? 41
41 Zachary s Karate Club Dataset Zachary s Karate Club is a dataset that describes the social relationships by Wayne W. Zachary in his paper An Information Flow Model for Conflict and Fission in Small Groups 34 nodes representing the member of the club 77 edges representing the friendship between the members src: 42
42 Out-class activity 3 (due before Tutorial #3) Read the paper An Information Flow Model for Conflict and Fission in Small Groups (3400+ citations in 2018) Conflict_and_Fission.pdf 1. 3 points about their contributions 2. 3 possible extensions with we learnt from the course 3. Submit by Facebook post before tomorrow noon 43
43 End of Lecture ( ) Questions / Comments? 44
44 Recommendation 45
45 Types of Recommendations 1. Image (Flickr) 2. Video (YouTube, Youku, Netflix) 3. Cuisine (Openrice, Dianping) 4. Friend/Member/Articles (Facebook, Renren, WeChat, Line, etc.) 5. Webpage/ bookmarks (Delicious) 6. Product (ebay, Amazon) 46
46 Recommendation Inputs 1. When users interest/preference are specified by users, recommend by criteria. 2. Recommend through social data, history, behavioral data with machine learning and data mining techniques. Netflix recommendation system example 47
47 Common Techniques in Social Networks 1. Collaborative filtering (CF) Understand user properties for recommendation e.g., tagging for user generated content 2. K-NN based recommendation Understand the item and user properties for recommendation Similarities among items and users are calculated 48
48 Collaborative Filtering 49
49 Collaborative Filtering (CF) 1. The most prominent approach used by large, commercial e-commerce sites well-understood, various algorithms and variations exist applicable in many domains (books, movies...) 2. Basic assumption and idea customers tastes does not change much with time 50
50 51 Collaborative Filtering (CF) Leveraging similarity abcd How it works 1. Should item 1 be recommended to Tim from the user-item matrix? Item to Item 2. 2 approaches: user to user (calculate user similarity ) item to item (calculate item similarity) User to User
51 Collaborative Filtering (CF) User-to-user Finding similar users (also similar tastes) e.g., Jaccard similarity abcd Jane and Tim both liked item 2 and disliked item 3 they have similar tastes!(# $, # & ) = # $ # & # $ # & Item 1 is recommended to Tim (item 1 is liked by Jane) 52
52 Collaborative Filtering (CF) User-to-user User-based Nearest Neighbor Neighbor = similar users Generate a prediction for an item i by analyzing ratings for i from users in u s neighborhood pred( u, i) = r u + å vìneighbors( u) å vìneighbors( u) sim( u, v) ( r vi sim( u, v) - r v ) 53
53 54 Collaborative Filtering (CF) Item-to-item Finding items that have similar subscribers Dom and Sandra are 2 users both like item 1 & 4 Users like item 4, also like item 1 at the same time item 1 will be recommended to Tim.
54 Collaborative Filtering (CF) Item-to-item Item-Based Nearest Neighbor Generate predictions based on similarities between items. Prediction for a user u and item i is composed of a weighted sum of the user u s ratings for items most similar to i. pred ( u, i) = å jîrateditems ( u) å jîrateditems ( u) sim( i, j) r sim( i, j) ui 55
55 Example: Friendship Recommendation 1. Similarity among users can be found through useritem matrix 2. Recommend Don to Jane (as online friend), since they have most similar tastes (common interests) Jane Tim Don 56
56 K-nearest neighbors (K-NN) 57
57 Recommendation is Classification Problems 1. 2 classes: like or dislike 2. Recommendation: find items that will be liked 3. Example: which clothes will be liked? 58
58 K-nearest neighbors (K-NN) 1. The simplest machine learning algorithm for classification 2. Assign an object to the class most common among its k nearest neighbors after some voting mechanism; 3. Different neighbors could have different weights e.g., the nearest one has a higher weight (by similarity) 59
59 K-NN: classifying a fish 2 classes: sea bass and salmon k = 3, (2 sea bass, 1 salmon) Classified as sea bass 3 classes: sea bass, salmon and eel k = 5, (3 sea bass, 1 eel, 1 salmon) Classified as sea bass 60
60 K-NN: An algorithm to find users/object with similar tastes/subscribers Step 2: Find the k nearest neighbors, the k items with highest similarity like? disliked disliked Step 1: Collect a set of labeled samples, liked liked liked liked liked Tim liked disliked If K=3, then in this case query instance will be classified as positive since 2 nearest neighbors are positive Step 3: Classify the input item. e.g., like or dislike the item 61
Web Structure Mining Nodes, Links and Influence
Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.
More informationRecommendation Systems
Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation
More informationKristina Lerman USC Information Sciences Institute
Rethinking Network Structure Kristina Lerman USC Information Sciences Institute Università della Svizzera Italiana, December 16, 2011 Measuring network structure Central nodes Community structure Strength
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationCollaborative Filtering
Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian Collaborative Filtering Combining information among collaborating entities to make
More informationRecommendation Systems
Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation
More informationComplex Social System, Elections. Introduction to Network Analysis 1
Complex Social System, Elections Introduction to Network Analysis 1 Complex Social System, Network I person A voted for B A is more central than B if more people voted for A In-degree centrality index
More informationRecommendation Systems
Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume
More informationData Mining Recitation Notes Week 3
Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents
More informationClass President: A Network Approach to Popularity. Due July 18, 2014
Class President: A Network Approach to Popularity Due July 8, 24 Instructions. Due Fri, July 8 at :59 PM 2. Work in groups of up to 3 3. Type up the report, and submit as a pdf on D2L 4. Attach the code
More informationDS504/CS586: Big Data Analytics Graph Mining II
Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please
More informationCS246 Final Exam, Winter 2011
CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationIntroduction to Link Prediction
Introduction to Link Prediction Machine Learning and Modelling for Social Networks Lloyd Sanders, Olivia Woolley, Iza Moize, Nino Antulov-Fantulin D-GESS: Computational Social Science COSS Overview What
More informationTechniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods
Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent
More information6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search
6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)
More informationDS504/CS586: Big Data Analytics Graph Mining II
Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report
More informationMatrix Factorization and Collaborative Filtering
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Matrix Factorization and Collaborative Filtering MF Readings: (Koren et al., 2009)
More informationCSE 494/598 Lecture-4: Correlation Analysis. **Content adapted from last year s slides
CSE 494/598 Lecture-4: Correlation Analysis LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **Content adapted from last year s slides Announcements Project-1 Due: February 12 th 2016 Analysis report:
More informationCS425: Algorithms for Web Scale Data
CS: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS. The original slides can be accessed at: www.mmds.org Customer
More informationLink Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci
Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationPreliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!
Data Mining The art of extracting knowledge from large bodies of structured data. Let s put it to use! 1 Recommendations 2 Basic Recommendations with Collaborative Filtering Making Recommendations 4 The
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationLab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018
Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google
More informationDegree Distribution: The case of Citation Networks
Network Analysis Degree Distribution: The case of Citation Networks Papers (in almost all fields) refer to works done earlier on same/related topics Citations A network can be defined as Each node is
More informationLink Analysis and Web Search
Link Analysis and Web Search Episode 11 Baochun Li Professor Department of Electrical and Computer Engineering University of Toronto Link Analysis and Web Search (Chapter 13, 14) Information networks and
More informationDATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS
DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationBinary Principal Component Analysis in the Netflix Collaborative Filtering Task
Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research
More informationCommunity Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria
Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US
More informationGraph Helmholtzian and Rank Learning
Graph Helmholtzian and Rank Learning Lek-Heng Lim NIPS Workshop on Algebraic Methods in Machine Learning December 2, 2008 (Joint work with Xiaoye Jiang, Yuan Yao, and Yinyu Ye) L.-H. Lim (NIPS 2008) Graph
More informationAndriy Mnih and Ruslan Salakhutdinov
MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative
More informationLAPLACIAN MATRIX AND APPLICATIONS
LAPLACIAN MATRIX AND APPLICATIONS Alice Nanyanzi Supervisors: Dr. Franck Kalala Mutombo & Dr. Simukai Utete alicenanyanzi@aims.ac.za August 24, 2017 1 Complex systems & Complex Networks 2 Networks Overview
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve
More informationStatistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t
Markov Chains. Statistical Problem. We may have an underlying evolving system (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t Consecutive speech feature vectors are related
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationData Science Mastery Program
Data Science Mastery Program Copyright Policy All content included on the Site or third-party platforms as part of the class, such as text, graphics, logos, button icons, images, audio clips, video clips,
More informationChapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationPageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)
PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) In class, we saw this graph, with each node representing people who are following each other on Twitter: Our
More informationIntroduction to Social Network Analysis PSU Quantitative Methods Seminar, June 15
Introduction to Social Network Analysis PSU Quantitative Methods Seminar, June 15 Jeffrey A. Smith University of Nebraska-Lincoln Department of Sociology Course Website https://sites.google.com/site/socjasmith/teaching2/psu_social_networks_seminar
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationCost and Preference in Recommender Systems Junhua Chen LESS IS MORE
Cost and Preference in Recommender Systems Junhua Chen, Big Data Research Center, UESTC Email:junmshao@uestc.edu.cn http://staff.uestc.edu.cn/shaojunming Abstract In many recommender systems (RS), user
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationFinding Love in a Hopeless Place Applying Graphical Analysis to Advance the Frontier of Matching Algorithms
Finding Love in a Hopeless Place Applying Graphical Analysis to Advance the Frontier of Matching Algorithms Seth Hildick-Smith MS Stanford 2017, Computer Science sethjhs@stanford.edu Jonathan NeCamp MS
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation
More informationLink Prediction. Eman Badr Mohammed Saquib Akmal Khan
Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling
More informationCS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope
More informationLecture: Face Recognition
Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear
More informationDegree (k)
0 1 Pr(X k) 0 0 1 Degree (k) Figure A1: Log-log plot of the complementary cumulative distribution function (CCDF) of the degree distribution for a sample month (January 0) network is shown (blue), along
More informationECEN 689 Special Topics in Data Science for Communications Networks
ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationRecommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis.
Recommender December 13, 2016 What Are Recommender Systems? What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: What Are Recommender Systems? Various forms,
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationInternal link prediction: a new approach for predicting links in bipartite graphs
Internal link prediction: a new approach for predicting links in bipartite graphs Oussama llali, lémence Magnien and Matthieu Latapy LIP6 NRS and Université Pierre et Marie urie (UPM Paris 6) 4 place Jussieu
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationAdministrivia. Blobs and Graphs. Assignment 2. Prof. Noah Snavely CS1114. First part due tomorrow by 5pm Second part due next Friday by 5pm
Blobs and Graphs Prof. Noah Snavely CS1114 http://www.cs.cornell.edu/courses/cs1114 Administrivia Assignment 2 First part due tomorrow by 5pm Second part due next Friday by 5pm 2 Prelims Prelim 1: March
More informationA Note on Google s PageRank
A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to
More informationLecture 8: Linear Algebra Background
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected
More informationLecture 12: Link Analysis for Web Retrieval
Lecture 12: Link Analysis for Web Retrieval Trevor Cohn COMP90042, 2015, Semester 1 What we ll learn in this lecture The web as a graph Page-rank method for deriving the importance of pages Hubs and authorities
More informationMachine Learning and Deep Learning! Vincent Lepetit!
Machine Learning and Deep Learning!! Vincent Lepetit! 1! What is Machine Learning?! 2! Hand-Written Digit Recognition! 2 9 3! Hand-Written Digit Recognition! Formalization! 0 1 x = @ A Images are 28x28
More informationCopyright 2000, Kevin Wayne 1
Divide-and-Conquer Chapter 5 Divide and Conquer Divide-and-conquer. Break up problem into several parts. Solve each part recursively. Combine solutions to sub-problems into overall solution. Most common
More informationCollaborative Nowcasting for Contextual Recommendation
Collaborative for Contextual Recommendation Yu Sun 1, Nicholas Jing Yuan 2, Xing Xie 3, Kieran McDonald 4, Rui Zhang 5 University of Melbourne { 1 sun.y, 5 rui.zhang}@unimelb.edu.au Microsoft Research
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 8: Evaluation & SVD Paul Ginsparg Cornell University, Ithaca, NY 20 Sep 2011
More informationAssignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran
Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A
More informationCS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering
CS 175: Project in Artificial Intelligence Slides 4: Collaborative Filtering 1 Topic 6: Collaborative Filtering Some slides taken from Prof. Smyth (with slight modifications) 2 Outline General aspects
More informationDesigning Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 12
EECS 6A Designing Information Devices and Systems I Spring 06 Elad Alon, Babak Ayazifar Homework This homework is due April 6, 06, at Noon Homework process and study group Who else did you work with on
More informationMemory-Efficient Low Rank Approximation of Massive Graphs
Fast and Memory-Efficient Low Rank Approximation of Massive Graphs Inderjit S. Dhillon University of Texas at Austin Purdue University Jan 31, 2012 Joint work with Berkant Savas, Donghyuk Shin, Si Si Han
More informationOnline Social Networks and Media. Opinion formation on social networks
Online Social Networks and Media Opinion formation on social networks Diffusion of items So far we have assumed that what is being diffused in the network is some discrete item: E.g., a virus, a product,
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More information6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks
6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks Daron Acemoglu and Asu Ozdaglar MIT November 4, 2009 1 Introduction Outline The role of networks in cooperation A model of social norms
More informationRecommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University
2018 EE448, Big Data Mining, Lecture 10 Recommender Systems Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of This Course Overview of
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationCollaborative Topic Modeling for Recommending Scientific Articles
Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao
More informationFinite Markov Information-Exchange processes
Finite Markov Information-Exchange processes David Aldous February 2, 2011 Course web site: Google Aldous STAT 260. Style of course Big Picture thousands of papers from different disciplines (statistical
More informationWiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte
Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationWhy matrices matter. Paul Van Dooren, UCL, CESAME
Why matrices matter Paul Van Dooren, UCL, CESAME Where are matrices coming from? ma trix (mā'trĭks) n., pl., ma tri ces (mā'trĭ-sēz') Anatomy. The womb (uterus).... Geology. The solid matter in which a
More informationGenerative Models for Discrete Data
Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers
More informationSUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES
SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES MICHELE BENZI AND CHRISTINE KLYMKO Abstract This document contains details of numerical
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore
More informationStat 315c: Introduction
Stat 315c: Introduction Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Stat 315c: Introduction 1 / 14 Stat 315c Analysis of Transposable Data Usual Statistics Setup there s Y (we ll
More informationGoogle Page Rank Project Linear Algebra Summer 2012
Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant
More informationLeverage Sparse Information in Predictive Modeling
Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart
More informationRising Algebra 2/Trig Students!
Rising Algebra 2/Trig Students! As a 7 th grader entering in to Algebra 2/Trig next year, it is very important that you have mastered the topics listed below. The majority of the topics were taught in
More information