Machine Learning CPSC 340. Tutorial 12

Similar documents
CPSC 540: Machine Learning

CS6220: DATA MINING TECHNIQUES

Link Mining PageRank. From Stanford C246

CS6220: DATA MINING TECHNIQUES

Online Social Networks and Media. Link Analysis and Web Search

0.1 Naive formulation of PageRank

Slides based on those in:

A Note on Google s PageRank

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

STA141C: Big Data & High Performance Statistical Computing

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS249: ADVANCED DATA MINING

Link Analysis. Stony Brook University CSE545, Fall 2016

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Online Social Networks and Media. Link Analysis and Web Search

Page rank computation HPC course project a.y

Data and Algorithms of the Web

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Graph Models The PageRank Algorithm

1998: enter Link Analysis

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Link Analysis Ranking

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Math 304 Handout: Linear algebra, graphs, and networks.

Google Page Rank Project Linear Algebra Summer 2012

DS504/CS586: Big Data Analytics Graph Mining II

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

DS504/CS586: Big Data Analytics Graph Mining II

Algebraic Representation of Networks

Machine Learning for Data Science (CS4786) Lecture 11

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Introduction to Data Mining

CPSC 540: Machine Learning

ECEN 689 Special Topics in Data Science for Communications Networks

Data Mining Recitation Notes Week 3

IR: Information Retrieval

Lecture 12: Link Analysis for Web Retrieval

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Lecture Notes: Markov chains Tuesday, September 16 Dannie Durand

Link Analysis and Web Search

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

Lesson Plan. AM 121: Introduction to Optimization Models and Methods. Lecture 17: Markov Chains. Yiling Chen SEAS. Stochastic process Markov Chains

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Web Structure Mining Nodes, Links and Influence

Markov Chains and MCMC

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

Inf 2B: Ranking Queries on the WWW

Improving Diversity in Ranking using Absorbing Random Walks

How does Google rank webpages?

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

Name: Matriculation Number: Tutorial Group: A B C D E

An Introduction to Machine Learning

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

IS4200/CS6200 Informa0on Retrieval. PageRank Con+nued. with slides from Hinrich Schütze and Chris6na Lioma

Data Mining Techniques

Data Mining and Matrices

1 Searching the World Wide Web

Graphic sequences, adjacency matrix

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

Attentive Betweenness Centrality (ABC): Considering Options and Bandwidth when Measuring Criticality

Comparing whole genomes

Lecture 2: The Simplex method

Markov Chains Handout for Stat 110

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Chapter 10. Finite-State Markov Chains. Introductory Example: Googling Markov Chains

Link Analysis. Leonid E. Zhukov

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

Lecture VI Introduction to complex networks. Santo Fortunato

Three right directions and three wrong directions for tensor research

Semi Supervised Distance Metric Learning

Hyperlinked-Induced Topic Search (HITS) identifies. authorities as good content sources (~high indegree) HITS [Kleinberg 99] considers a web page

Intelligent Data Analysis. PageRank. School of Computer Science University of Birmingham

Register machines L2 18

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Class President: A Network Approach to Popularity. Due July 18, 2014

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Data science with multilayer networks: Mathematical foundations and applications

The Static Absorbing Model for the Web a

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

Tutorial 12 Excess Pore Pressure (B-bar method) Undrained loading (B-bar method) Initial pore pressure Excess pore pressure

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

Node similarity and classification

The Theory behind PageRank

ELE2120 Digital Circuits and Systems. Tutorial Note 9

Vector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

Markov Processes Hamid R. Rabiee

CS54701 Information Retrieval. Link Analysis. Luo Si. Department of Computer Science Purdue University. Borrowed Slides from Prof.

Transcription:

Machine Learning CPSC 340 Tutorial 12

Random Walk on Graph Page Rank Algorithm

Label Propagation on Graph Assume a strongly connected graph G = (V, A)

Label Propagation on Graph Assume a strongly connected graph G = (V, A) V: set of nodes

Label Propagation on Graph Assume a strongly connected graph G = (V, A) V: set of nodes A: adjacency matrix

Label Propagation on Graph Assume a strongly connected graph G = (V, A) V: set of nodes A: adjacency matrix Two type of nodes: labeled and unlabeled

Label Propagation on Graph Assume a strongly connected graph G = (V, A) V: set of nodes A: adjacency matrix Two type of nodes: labeled and unlabeled Label is either +1 or -1

Label Propagation on Graph Assume a strongly connected graph G = (V, A) V: set of nodes A: adjacency matrix Two type of nodes: labeled and unlabeled Label is either +1 or -1 Gaol: assign a label to unlabeled nodes.

Random Walk for Label Propagation Random Walk on Graph: we jump from one node to another one with some probability

Random Walk for Label Propagation Random Walk on Graph: we jump from one node to another one with some probability Label propagation algorithm

Random Walk for Label Propagation Random Walk on Graph: we jump from one node to another one with some probability Label propagation algorithm start from an unlabeled node v

Random Walk for Label Propagation Random Walk on Graph: we jump from one node to another one with some probability Label propagation algorithm start from an unlabeled node v do k times random walk starting from v and store the output labels

Random Walk for Label Propagation Random Walk on Graph: we jump from one node to another one with some probability Label propagation algorithm start from an unlabeled node v do k times random walk starting from v and store the output labels do majority vote among the stored labels

Random Walk for Label Propagation Random Walk Algorithm repeat until you find a label

Random Walk for Label Propagation Random Walk Algorithm repeat until you find a label let v be the node you are in and has d v neighbours

Random Walk for Label Propagation Random Walk Algorithm repeat until you find a label let v be the node you are in and has d v neighbours if v is unlabeled, with uniform probability 1 d v pick one of its neighbours and jump to that node

Random Walk for Label Propagation Random Walk Algorithm repeat until you find a label let v be the node you are in and has d v neighbours if v is unlabeled, with uniform probability 1 d v pick one of its neighbours and jump to that node if v is labeled

Random Walk for Label Propagation Random Walk Algorithm repeat until you find a label let v be the node you are in and has d v neighbours if v is unlabeled, with uniform probability 1 d v pick one of its neighbours and jump to that node if v is labeled with probability 1 d v +1 output its label

Random Walk for Label Propagation Random Walk Algorithm repeat until you find a label let v be the node you are in and has d v neighbours if v is unlabeled, with uniform probability 1 d v pick one of its neighbours and jump to that node if v is labeled with probability 1 d v +1 output its label with uniform probability 1 d v +1 pick one of its neighbours jump to that node

Exercise Assume we are given adjacency matrix, a labellist, a matrix where the first column contains node numbers and the second column contains class labels and starting node, write code for random walk algorithm.

RW code

Ranking Problem The ranking problem: Input: a large set of objects (and possibly a query object). Output option 1: score of each object (and possibly for query). Output option 2: ordered list of most relevant objects (possibly for query).

Ranking Problem The ranking problem: Input: a large set of objects (and possibly a query object). Output option 1: score of each object (and possibly for query). Output option 2: ordered list of most relevant objects (possibly for query). Examples: Country comparisons (Global Hunger Index) Academic journals (Impact factor). Sports/gaming (Elo and TrueSkill) Internet search engines

PageRank Goal: ranking webpages based on some score or weight

PageRank Goal: ranking webpages based on some score or weight Assuming that we have n webpages and let the score of webpage i be p i, and P = (p i ) be a vector of size n

PageRank Goal: ranking webpages based on some score or weight Assuming that we have n webpages and let the score of webpage i be p i, and P = (p i ) be a vector of size n Make the directed webpage graph G node i has an edge into node j if there is a link from page i to j i.e. i j. Assume this graph is strongly connected and aperiodic and does not have absorbing node

PageRank Goal: ranking webpages based on some score or weight Assuming that we have n webpages and let the score of webpage i be p i, and P = (p i ) be a vector of size n Make the directed webpage graph G node i has an edge into node j if there is a link from page i to j i.e. i j. Assume this graph is strongly connected and aperiodic and does not have absorbing node Let m j be the number of outgoing edges from node j and m = (m j ) be a vector of size n

PageRank Goal: ranking webpages based on some score or weight Assuming that we have n webpages and let the score of webpage i be p i, and P = (p i ) be a vector of size n Make the directed webpage graph G node i has an edge into node j if there is a link from page i to j i.e. i j. Assume this graph is strongly connected and aperiodic and does not have absorbing node Let m j be the number of outgoing edges from node j and m = (m j ) be a vector of size n Let A be the adjacency matrix for G i.e. A ij = 1 if i j o.w. A ij = 0

PageRank Goal: ranking webpages based on some score or weight Assuming that we have n webpages and let the score of webpage i be p i, and P = (p i ) be a vector of size n Make the directed webpage graph G node i has an edge into node j if there is a link from page i to j i.e. i j. Assume this graph is strongly connected and aperiodic and does not have absorbing node Let m j be the number of outgoing edges from node j and m = (m j ) be a vector of size n Let A be the adjacency matrix for G i.e. A ij = 1 if i j o.w. A ij = 0 Let Z = A T (diag(m)) 1

PageRank Goal: ranking webpages based on some score or weight Assuming that we have n webpages and let the score of webpage i be p i, and P = (p i ) be a vector of size n Make the directed webpage graph G node i has an edge into node j if there is a link from page i to j i.e. i j. Assume this graph is strongly connected and aperiodic and does not have absorbing node Let m j be the number of outgoing edges from node j and m = (m j ) be a vector of size n Let A be the adjacency matrix for G i.e. A ij = 1 if i j o.w. A ij = 0 Let Z = A T (diag(m)) 1 (Z) ij : probability of jumping from page j to page i via a link

PageRank Goal: ranking webpages based on some score or weight Assuming that we have n webpages and let the score of webpage i be p i, and P = (p i ) be a vector of size n Make the directed webpage graph G node i has an edge into node j if there is a link from page i to j i.e. i j. Assume this graph is strongly connected and aperiodic and does not have absorbing node Let m j be the number of outgoing edges from node j and m = (m j ) be a vector of size n Let A be the adjacency matrix for G i.e. A ij = 1 if i j o.w. A ij = 0 Let Z = A T (diag(m)) 1 (Z) ij : probability of jumping from page j to page i via a link But we can go directly from page j to i by entering the address of page i in the address-bar of browser add some small amount to all (Z)ij and normalize!

PageRank For some d (0, 1) let E = ones(n, n) and T = 1 d n E + dz

PageRank For some d (0, 1) let E = ones(n, n) and T = 1 d n E + dz Now with transition matrix T for a webpage graph G, we have P = TP

PageRank For some d (0, 1) let E = ones(n, n) and T = 1 d n E + dz Now with transition matrix T for a webpage graph G, we have P = TP Each p i is a probability, so n i=1 p i = 1

PageRank For some d (0, 1) let E = ones(n, n) and T = 1 d n E + dz Now with transition matrix T for a webpage graph G, we have P = TP Each p i is a probability, so n i=1 p i = 1 With two above equations, we can find all p i s by solving corresponding linear system

PageRank: Exercise Let d = 3 4. For the following webpage graph, find A, m, Z and T. Then make the linear system and solve it and find P.

PageRank: Solution 1 1 2 0 0 1 1 0 2 0 2 m = 2 2 A = 1 0 1 0 0 1 0 1 Z = 1 1 0 0 2 2 1 1 2 2 0 0 1 1 2 1 1 0 0 2 0 2 0 1 7 1 1 1 1 1 T = 1 1 1 1 1 1 1 1 1 + 3 4 Z = 1 1 7 7 7 1 1 1 1 1 7 1 P = TP (T I )P = 0 (I is Identity matrix) 4 p i = 1 i=1 7 7 7 1 1

PageRank: Solution Combining two equations, we get 1 1 7 7 7 1 7 1 1 7 1 7 7 7 1 1 1 1 1 1 0.25 P = 0.25 0.25 0.25 p 1 p 2 p 3 p 4 0 = 0 0 0 1