How Does Google?! A journey into the wondrous mathematics behind your favorite websites. David F. Gleich! Computer Science! Purdue University!
|
|
- Edmund Wiggins
- 5 years ago
- Views:
Transcription
1 ! How Does Google?! A journey into the wondrous mathematics behind your favorite websites David F. Gleich! Computer Science! Purdue University! 1
2 Mathematics underlies an enormous number of the websites we use everyday! 2
3 1. s PageRank 2. Multi-armed bandits and internet experiments 3
4 4
5 Larry Page! Sergey Brin! Created a web-search algorithm called backrub Spun-off a company Googol based on the paper Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd The PageRank Citation Ranking: Bringing Order to the Web TR, Stanford InfoLab, 1999 The importance of a page is determined by the importance of pages that link to it. 5
6 A websearch primer 1. Crawl webpages 2. Analyze webpage text (information retrieval) 3. Analyze webpage links 4. Fit over 200 measures to human evaluations 5. Produce rankings 6. Continuously update 6
7 Pages, nodes, incoming links, outgoing links, and importance c b Important pages that link to me! 7 a Important pages that link to Purdue!
8 8
9 Tim Davis and Yifan Hu Sparse Matrix Gallery
10 The web 1000 vertices on 8.5-by-11 paper 1,000,000,000,000 vertices (one trillion) Paper the size of Manhattan island! (23 sq miles)? 10
11 We need something better! 11
12 A wee web-graph: link counting is too easy to game! 1/ /2 1/3 1 1/3 1/
13 A wee web-graph: link counting is too easy to game! The importance of a page is determined by the importance of pages that link to it. x 1 =0 2 1/2 1/2 1/ /3 1/3 4 x 2 = 1 3 x 1 x 3 = 1 3 x x x 4 = 1 3 x 1 + x 3 + x 5 x 5 = x 4 x 6 = 1 2 x 2 13
14 The importance of a page is determined by the importance of pages that link to it 2 1/2 3 1/3 x 3 = 1 3 x x 2 Importance of page i 1 X 1 x = i j2b i d j x j Importance of page j Back-links from page i Why it was called Backrub! Number of links page j uses! out-degree in graph theory 14
15 We can rewrite this equation in a more mathematically convenient way x = 0x + 0x + 0x + 0x + 0x + 0x x = x + 0x + 0x + 0x + 0x + 0x x = x + x + 0x + 0x + 0x + 0x x = x + 0x + 1x + 0x + 1x + 0x x = 0x + 0x + 0x + 1x + 0x + 0x x = 0x + x + 0x + 0x + 0x + 0x
16 And even more conveniently! x x1 x 2 1/ x 2 x 3 1/ 3 1/ x 3 = x4 1/ x4 x x 5 x 0 1/ x 6 6 Element k in column m = "probability" of going from node m to node k or x = Px 16
17 The matrix P for websites shows a lot of structure Every dot is a non-zero element indicating a link Matrices are sparse, and generally with block structure block structure can be explored to speed up ranking algorithm 17
18 But this idea doesn t work for the wee web-graph Nodes 1, 4 and 5 determine everything! 1/2 3 x 1 =0 x 2 = 1 3 x 1 =0 2 1/2 1/3 1 1/3 1/3 4 x 3 = 1 3 x x 2 =0 x 4 = 1 3 x 1 + x 3 + x 5 = x 5 x 5 = x 4 x 6 = 1 2 x 2 =
19 But this idea doesn t work for the wee web-graph Node 1! lonely Nodes 4 and 5! mutual admiration societies Node 6 anti-social 2 1/2 6 1/2 1/ /3 5 1/3 4 These nodes need to be fixed to get a reliable and useful ranking! 19
20 The gang of four to the rescue Andrei Markov Oscar Perron Georg Frogenius Richard! von Mises 20
21 Let s fix it up and force node 6 to choose, or link to everyone / P = 1/3 1/ / / /6 1/ /6 P = 1/3 1/ /6 61/ / /65 0 1/ /
22 Taxation is the way to representation! b a c If is a good page, then it ll still be a good page if we tax the importance from a, b, and c We can redistribute the taxed amounts to all including lonely nodes! 22
23 The importance of a page is determined by the importance of pages that link to it * The taxation rate of all x i = X j2b i x j d j + (1 )b i Benefits to page i The total importance that page j! contributes to page i * After tax and any benefits 23
24 Perron and Frobenius showed the new equation always has a unique solution! # # # # # # # # # " x 1 x 2 x 3 x 4 x 5 x 6 $ &! & # & # & # & = α# & # & # & & " # % / 6 1/ / 6 1/ 3 1/ / 6 1/ / / 6 0 1/ / 6! $ # &# &# &# &# &# &# % &# # " x 1 x 2 x 3 x 4 x 5 x 6 $ & & & & & & & & & %! # # # # + (1 α) # # # # # " b 1 b 2 b 3 b 4 b 5 b 6 $ & & & & & & & & & % x = Px + (1 )b 24
25 What von Mises and Richardson showed is that guess, check, and correct works! x (new) = Px (old) + (1 x (start) = x (1) = x (2) = /2 3 x (1) = )b /3 1/3 1/3 4 1/
26 26
27 There s still a lot of work left to do to make a search engine Make it fast! Watch out for spam Watch out for manipulation Personalize Experiment! 27
28 1. s PageRank 2. Multi-armed bandits and internet experiments 28
29 Not this! 29
30 This! Pays out! $0.99/ dollar Pays out! $0.95/ dollar Pays out! $0.92/ dollar Pays out! $0.98/ dollar 30
31 What in the heck does a multi-armed bandit have to do with Google? 31
32 What in the heck does a multi-armed bandit have to do with Google? Pays out! $0.91/ view to show ads Pays out! -$0.02/view hide ads Pays out! $0.92/ view Pays out! $0.66/ view 32
33 How to optimize your website without exploiting the bandits Try condition A 100 times, find 45 wins Try condition B 100 times, find 85 wins Try condition C 100 times, find 10 wins Choose the best! 33
34 This field has some of the best terminology Explore! Exploit! Regret 34
35 This field has some of the best terminology Explore Visiting Las Vegas! Exploit Your new winning strategy! Regret That you didn t quit after winning the first round 35
36 This field has some of the best terminology Explore Testing slot machines/ experiments for their reward Exploit Playing the best reward you ve found so far Regret How much you lost due! to exploration 36
37 How to optimize your website without exploiting the bandits Try condition A 100 times, find 45 wins Try condition B 100 times, find 85 wins Try condition C 100 times, find 10 wins Choose the best! We only exploit our findings at the end! Pure exploration! 37
38 How to optimize your website exploiting the bandits Try condition A 5 times, find 4 wins! Try condition B 5 times, find 4 wins! Try condition C 5 times, find 2 wins Try condition A 7 times, find 3 wins! Try condition B 7 times, find 5 wins! Try condition C 1 time, find 0 wins Condition A B C Est. Return Pure exploration! Exploit our knowledge 38
39 The goal of these problems is to construct optimal strategies to minimize regret Regret how much you left on the table by exploring E[play best always plays made based on data] regret 100-each 255/ /300 = 0.38 regret 30-mixed 25.5/ = 0.31 zero-regret strategy is one where regret(t trials) is sublinear in T! as the number of plays T 39
40 [The bandit problem] was formulated during the [second world] war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage. Peter Whittle (Whittle, 1979) Discussion of Bandit processes and dynamical allocation indices Their importance to website optimization, advertising, and recommendation has rejuvenated research on these problems with fascinating new questions. 40
41 Math is everywhere and especially your favorite websites! Matrices and probability are key ingredients. 41
42 = 0.50 United States C:Living people France Germany England United Kingdom Canada Japan Poland Australia = 0.85 United States C:Main topic classif. C:Contents C:Living people C:Ctgs. by country United Kingdom C:Fundamental C:Ctgs. by topic C:Wikipedia admin. France = 0.99 C:Contents C:Main topic classif. C:Fundamental United States C:Wikipedia admin. P:List of portals P:Contents/Portals C:Portals C:Society C:Ctgs. by topic Note Top 10 articles on Wikipedia with highest PageRank 42
Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018
Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google
More informationLink Analysis. Leonid E. Zhukov
Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization
More informationData Mining Recitation Notes Week 3
Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents
More informationLecture 7 Mathematics behind Internet Search
CCST907 Hidden Order in Daily Life: A Mathematical Perspective Lecture 7 Mathematics behind Internet Search Dr. S. P. Yung (907A) Dr. Z. Hua (907B) Department of Mathematics, HKU Outline Google is the
More informationData Mining and Matrices
Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages
More information1 Searching the World Wide Web
Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on
More informationA Note on Google s PageRank
A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to
More informationAnnouncements: Warm-up Exercise:
Fri Mar 30 5.6 Discrete dynamical systems: Google page rank, using some notes I wrote a while ago. I'll probably also bring a handout to class. There are other documents on the internet about this subject.
More informationThree results on the PageRank vector: eigenstructure, sensitivity, and the derivative
Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative David Gleich 1, Peter Glynn 2, Gene Golub 3, Chen Greif 4 1 Stanford University, Institute for Computational and Mathematical
More informationInf 2B: Ranking Queries on the WWW
Inf B: Ranking Queries on the WWW Kyriakos Kalorkoti School of Informatics University of Edinburgh Queries Suppose we have an Inverted Index for a set of webpages. Disclaimer Not really the scenario of
More informationPageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10
PageRank Ryan Tibshirani 36-462/36-662: Data Mining January 24 2012 Optional reading: ESL 14.10 1 Information retrieval with the web Last time we learned about information retrieval. We learned how to
More informationThe Giving Game: Google Page Rank
The Giving Game: Google Page Rank University of Utah Teachers Math Circle Nick Korevaar March, 009 Stage 1: The Game Imagine a game in which you repeatedly distribute something desirable to your friends,
More informationCalculating Web Page Authority Using the PageRank Algorithm
Jacob Miles Prystowsky and Levi Gill Math 45, Fall 2005 1 Introduction 1.1 Abstract In this document, we examine how the Google Internet search engine uses the PageRank algorithm to assign quantitatively
More informationGraph Models The PageRank Algorithm
Graph Models The PageRank Algorithm Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2013 The PageRank Algorithm I Invented by Larry Page and Sergey Brin around 1998 and
More informationUncertainty and Randomization
Uncertainty and Randomization The PageRank Computation in Google Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it 1993: Robustness of Linear Systems 1993: Robustness of Linear Systems 16 Years
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering
More informationLink Analysis. Stony Brook University CSE545, Fall 2016
Link Analysis Stony Brook University CSE545, Fall 2016 The Web, circa 1998 The Web, circa 1998 The Web, circa 1998 Match keywords, language (information retrieval) Explore directory The Web, circa 1998
More informationAlgebraic Representation of Networks
Algebraic Representation of Networks 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 Hiroki Sayama sayama@binghamton.edu Describing networks with matrices (1) Adjacency matrix A matrix with rows and columns labeled by
More informationDATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS
DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information
More informationNode Centrality and Ranking on Networks
Node Centrality and Ranking on Networks Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Social
More informationJustification and Application of Eigenvector Centrality
Justification and Application of Eigenvector Centrality Leo Spizzirri With great debt to P. D. Straffin, Jr Algebra in Geography: Eigenvectors of Networks[7] 3/6/2 Introduction Mathematics is only a hobby
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationNode and Link Analysis
Node and Link Analysis Leonid E. Zhukov School of Applied Mathematics and Information Science National Research University Higher School of Economics 10.02.2014 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014
More information1998: enter Link Analysis
1998: enter Link Analysis uses hyperlink structure to focus the relevant set combine traditional IR score with popularity score Page and Brin 1998 Kleinberg Web Information Retrieval IR before the Web
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationCSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68
CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 References 1 L. Freeman, Centrality in Social Networks: Conceptual Clarification, Social Networks, Vol. 1, 1978/1979, pp. 215 239. 2 S. Wasserman
More informationGoogle PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano
Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google
More informationLecture 12: Link Analysis for Web Retrieval
Lecture 12: Link Analysis for Web Retrieval Trevor Cohn COMP90042, 2015, Semester 1 What we ll learn in this lecture The web as a graph Page-rank method for deriving the importance of pages Hubs and authorities
More informationFinding central nodes in large networks
Finding central nodes in large networks Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands Woudschoten Conference 2017 Complex networks Networks: Internet, WWW, social
More informationHow works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University
How works or How linear algebra powers the search engine M. Ram Murty, FRSC Queen s Research Chair Queen s University From: gomath.com/geometry/ellipse.php Metric mishap causes loss of Mars orbiter
More informationGoogle Page Rank Project Linear Algebra Summer 2012
Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant
More informationIntroduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa
Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis
More informationWeb Ranking. Classification (manual, automatic) Link Analysis (today s lesson)
Link Analysis Web Ranking Documents on the web are first ranked according to their relevance vrs the query Additional ranking methods are needed to cope with huge amount of information Additional ranking
More informationApplications of The Perron-Frobenius Theorem
Applications of The Perron-Frobenius Theorem Nate Iverson The University of Toledo Toledo, Ohio Motivation In a finite discrete linear dynamical system x n+1 = Ax n What are sufficient conditions for x
More informationMath 304 Handout: Linear algebra, graphs, and networks.
Math 30 Handout: Linear algebra, graphs, and networks. December, 006. GRAPHS AND ADJACENCY MATRICES. Definition. A graph is a collection of vertices connected by edges. A directed graph is a graph all
More informationHow does Google rank webpages?
Linear Algebra Spring 016 How does Google rank webpages? Dept. of Internet and Multimedia Eng. Konkuk University leehw@konkuk.ac.kr 1 Background on search engines Outline HITS algorithm (Jon Kleinberg)
More informationWiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte
Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a
More informationeigenvalues, markov matrices, and the power method
eigenvalues, markov matrices, and the power method Slides by Olson. Some taken loosely from Jeff Jauregui, Some from Semeraro L. Olson Department of Computer Science University of Illinois at Urbana-Champaign
More informationComputing PageRank using Power Extrapolation
Computing PageRank using Power Extrapolation Taher Haveliwala, Sepandar Kamvar, Dan Klein, Chris Manning, and Gene Golub Stanford University Abstract. We present a novel technique for speeding up the computation
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationOn the mathematical background of Google PageRank algorithm
Working Paper Series Department of Economics University of Verona On the mathematical background of Google PageRank algorithm Alberto Peretti, Alberto Roveda WP Number: 25 December 2014 ISSN: 2036-2919
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com
More informationThe Push Algorithm for Spectral Ranking
The Push Algorithm for Spectral Ranking Paolo Boldi Sebastiano Vigna March 8, 204 Abstract The push algorithm was proposed first by Jeh and Widom [6] in the context of personalized PageRank computations
More information6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search
6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)
More informationThe Second Eigenvalue of the Google Matrix
The Second Eigenvalue of the Google Matrix Taher H. Haveliwala and Sepandar D. Kamvar Stanford University {taherh,sdkamvar}@cs.stanford.edu Abstract. We determine analytically the modulus of the second
More informationLink Mining PageRank. From Stanford C246
Link Mining PageRank From Stanford C246 Broad Question: How to organize the Web? First try: Human curated Web dictionaries Yahoo, DMOZ LookSmart Second try: Web Search Information Retrieval investigates
More informationECEN 689 Special Topics in Data Science for Communications Networks
ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationInformation Retrieval and Search. Web Linkage Mining. Miłosz Kadziński
Web Linkage Analysis D24 D4 : Web Linkage Mining Miłosz Kadziński Institute of Computing Science Poznan University of Technology, Poland www.cs.put.poznan.pl/mkadzinski/wpi Web mining: Web Mining Discovery
More informationApplication. Stochastic Matrices and PageRank
Application Stochastic Matrices and PageRank Stochastic Matrices Definition A square matrix A is stochastic if all of its entries are nonnegative, and the sum of the entries of each column is. We say A
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationPageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)
PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) In class, we saw this graph, with each node representing people who are following each other on Twitter: Our
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationAnalysis of Google s PageRank
Analysis of Google s PageRank Ilse Ipsen North Carolina State University Joint work with Rebecca M. Wills AN05 p.1 PageRank An objective measure of the citation importance of a web page [Brin & Page 1998]
More informationEigenvalue Problems Computation and Applications
Eigenvalue ProblemsComputation and Applications p. 1/36 Eigenvalue Problems Computation and Applications Che-Rung Lee cherung@gmail.com National Tsing Hua University Eigenvalue ProblemsComputation and
More informationgoogling it: how google ranks search results Courtney R. Gibbons October 17, 2017
googling it: how google ranks search results Courtney R. Gibbons October 17, 2017 Definition (Relevance) (noun): the quality or state of being closely connected or appropriate: this film has contemporary
More informationMultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors
MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors Michael K. Ng Centre for Mathematical Imaging and Vision and Department of Mathematics
More informationLink Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Link Analysis Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages
More informationEvaluation of multi armed bandit algorithms and empirical algorithm
Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.
More informationApplications to network analysis: Eigenvector centrality indices Lecture notes
Applications to network analysis: Eigenvector centrality indices Lecture notes Dario Fasino, University of Udine (Italy) Lecture notes for the second part of the course Nonnegative and spectral matrix
More informationChapter 10. Finite-State Markov Chains. Introductory Example: Googling Markov Chains
Chapter 0 Finite-State Markov Chains Introductory Example: Googling Markov Chains Google means many things: it is an Internet search engine, the company that produces the search engine, and a verb meaning
More informationA New Method to Find the Eigenvalues of Convex. Matrices with Application in Web Page Rating
Applied Mathematical Sciences, Vol. 4, 200, no. 9, 905-9 A New Method to Find the Eigenvalues of Convex Matrices with Application in Web Page Rating F. Soleymani Department of Mathematics, Islamic Azad
More information0.1 Naive formulation of PageRank
PageRank is a ranking system designed to find the best pages on the web. A webpage is considered good if it is endorsed (i.e. linked to) by other good webpages. The more webpages link to it, and the more
More informationComplex Social System, Elections. Introduction to Network Analysis 1
Complex Social System, Elections Introduction to Network Analysis 1 Complex Social System, Network I person A voted for B A is more central than B if more people voted for A In-degree centrality index
More informationMAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds
MAE 298, Lecture 8 Feb 4, 2008 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in a file-sharing
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationApplications. Nonnegative Matrices: Ranking
Applications of Nonnegative Matrices: Ranking and Clustering Amy Langville Mathematics Department College of Charleston Hamilton Institute 8/7/2008 Collaborators Carl Meyer, N. C. State University David
More informationThree right directions and three wrong directions for tensor research
Three right directions and three wrong directions for tensor research Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney
More informationarxiv:cond-mat/ v1 3 Sep 2004
Effects of Community Structure on Search and Ranking in Information Networks Huafeng Xie 1,3, Koon-Kiu Yan 2,3, Sergei Maslov 3 1 New Media Lab, The Graduate Center, CUNY New York, NY 10016, USA 2 Department
More informationGoogle and Biosequence searches with Markov Chains
Google and Biosequence searches with Markov Chains Nigel Buttimore Trinity College Dublin 3 June 2010 UCD-TCD Mathematics Summer School Frontiers of Maths and Applications Summary A brief history of Andrei
More informationLINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
LINK ANALYSIS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link analysis Models
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart
More informationMathematical Properties & Analysis of Google s PageRank
Mathematical Properties & Analysis of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca M. Wills Cedya p.1 PageRank An objective measure of the citation importance
More informationIR: Information Retrieval
/ 44 IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá Department of Computer Science, UPC
More informationPosition and Displacement
Position and Displacement Ch. in your text book Objectives Students will be able to: ) Explain the difference between a scalar and a vector quantity ) Explain the difference between total distance traveled
More informationLink Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci
Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationAnalysis and Computation of Google s PageRank
Analysis and Computation of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca S. Wills ANAW p.1 PageRank An objective measure of the citation importance of a web
More informationComputational Economics and Finance
Computational Economics and Finance Part II: Linear Equations Spring 2016 Outline Back Substitution, LU and other decomposi- Direct methods: tions Error analysis and condition numbers Iterative methods:
More informationPseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports
Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports Jevin West and Carl T. Bergstrom November 25, 2008 1 Overview There
More informationWhere Is Newton Taking Us? And How Fast?
Name: Where Is Newton Taking Us? And How Fast? In this activity, you ll use a computer applet to investigate patterns in the way the approximations of Newton s Methods settle down to a solution of the
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]
More informationHub, Authority and Relevance Scores in Multi-Relational Data for Query Search
Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li 1 Michael Ng 2 Yunming Ye 1 1 Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology,
More informationRandomization and Gossiping in Techno-Social Networks
Randomization and Gossiping in Techno-Social Networks Roberto Tempo CNR-IEIIT Consiglio Nazionale delle Ricerche Politecnico ditorino roberto.tempo@polito.it CPSN Social Network Layer humans Physical Layer
More informationRandom Surfing on Multipartite Graphs
Random Surfing on Multipartite Graphs Athanasios N. Nikolakopoulos, Antonia Korba and John D. Garofalakis Department of Computer Engineering and Informatics, University of Patras December 07, 2016 IEEE
More informationKrylov Subspace Methods to Calculate PageRank
Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University.
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive
More informationQuick Introduction to Nonnegative Matrix Factorization
Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationFacebook Friends! and Matrix Functions
Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra
More informationTen good reasons to use the Eigenfactor TM metrics
Ten good reasons to use the Eigenfactor TM metrics Massimo Franceschet Department of Mathematics and Computer Science, University of Udine Via delle Scienze 206 33100 Udine, Italy massimo.franceschet@dimi.uniud.it
More informationMarkov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525
Markov Models and Reinforcement Learning Stephen G. Ware CSCI 4525 / 5525 Camera Vacuum World (CVW) 2 discrete rooms with cameras that detect dirt. A mobile robot with a vacuum. The goal is to ensure both
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal
More information( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of
Factoring Review for Algebra II The saddest thing about not doing well in Algebra II is that almost any math teacher can tell you going into it what s going to trip you up. One of the first things they
More informationPr[positive test virus] Pr[virus] Pr[positive test] = Pr[positive test] = Pr[positive test]
146 Probability Pr[virus] = 0.00001 Pr[no virus] = 0.99999 Pr[positive test virus] = 0.99 Pr[positive test no virus] = 0.01 Pr[virus positive test] = Pr[positive test virus] Pr[virus] = 0.99 0.00001 =
More informationExploration. 2015/10/12 John Schulman
Exploration 2015/10/12 John Schulman What is the exploration problem? Given a long-lived agent (or long-running learning algorithm), how to balance exploration and exploitation to maximize long-term rewards
More informationCOMPSCI 514: Algorithms for Data Science
COMPSCI 514: Algorithms for Data Science Arya Mazumdar University of Massachusetts at Amherst Fall 2018 Lecture 4 Markov Chain & Pagerank Homework Announcement Show your work in the homework Write the
More informationUsing Linear Equations to Solve Problems
Chapter 5: Writing Linear Equations Sections 1-4 Name Algebra Notes Using Linear Equations to Solve Problems Slope-Intercept Point Slope Standard Form y = mx + b y- y 1 = m ( x = x 1 ) Ax + By = C So,
More information