Using R for Iterative and Incremental Processing

Size: px
Start display at page:

Download "Using R for Iterative and Incremental Processing"

Transcription

1 Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY

2 Big Data, Complex Algorithms PageRank (Dominant eigenvector) Recommendations (Matrix factorization) Anomaly detection (Top-K eigenvalues) User Importance (Vertex Centrality) 6/29/2012 2

3 Big Data, Complex Algorithms PageRank (Dominant eigenvector) Machine learning + Graph algorithms Recommendations (Matrix factorization) Iterative Linear Algebra Operations Anomaly detection (Top-K eigenvalues) User Importance (Vertex Centrality) 6/29/2012 3

4 PageRank Using Matrices N s P 1 P 1 s P 1 P 1 N p M p Z Dominant eigenvector M = modified web graph matrix p = PageRank vector Simplified algorithm: repeat { p = M*p + Z} 6/29/2012 4

5 Breadth-first Search Using Matrices A B C D E X A B C D E A B C D G E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 5

6 Breadth-first Search Using Matrices A B D A B D C E C E X A B C D E * * * 0 0 A B C D E A B C D G E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 6

7 Breadth-first Search Using Matrices A B D A B D A B D C E C E C E X * * * 0 0 * * * * 0 A B C D E A B C D E A B C D E A B C D G E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 7

8 Breadth-first Search Using Matrices A B D A B D A B D A B D C E C E C E C E X * * * 0 0 * * * * 0 * * * * * A B C D E A B C D E A B C D E A B C D E A B C D G E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 8

9 Breadth-first Search Using Matrices * * * 0 0 A B C D E A B C D E Matrix operations A B C D * * * * 0 A B C D E * * * * * A B C D E Easy to express Efficient to implement E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 9

10 Linear Algebra on Existing Frameworks Matrix Operations: Structured, coarse grained Need global state 6/29/

11 Linear Algebra on Existing Frameworks Matrix Operations: Structured, coarse grained Need global state Data-parallel frameworks MapReduce/Dryad Process each record in parallel Use case: Computing sufficient statistics 6/29/

12 Linear Algebra on Existing Frameworks Matrix Operations: Structured, coarse grained Need global state Data-parallel frameworks MapReduce/Dryad Process each record in parallel Use case: Computing sufficient statistics Graph-centric frameworks Pregel/GraphLab Process each vertex in parallel Use case: Graph models 6/29/

13 Challenge 1 Sparse Matrices 6/29/

14 Challenge 1 Sparse Matrices 6/29/

15 Challenge 1 Sparse Matrices 6/29/

16 Block density (normalized ) Challenge 1 Sparse Matrices LiveJournal Netflix ClueWeb-1B Block ID 6/29/

17 Block density (normalized ) Challenge 1 Sparse Matrices LiveJournal Netflix ClueWeb-1B Block ID 6/29/

18 Block density (normalized ) Challenge 1 Sparse Matrices LiveJournal Netflix ClueWeb-1B x more elements Computation imbalance Block ID 6/29/

19 Challenge 2 Incremental Updates New movie ratings Refine recommendations Better suggestions Incremental computation on consistent view of data 6/29/

20 Presto Framework for large-scale iterative linear algebra Extend R for scalability and incremental updates 6/29/

21 Outline Motivation Programming model Design Applications and Results 6/29/

22 Programming Model One data structure: Distributed Array A darray() 6/29/

23 Programming Model Iteration: foreach 6/29/

24 Programming Model Iteration: foreach Compute Compute Compute Compute 6/29/

25 Programming Model Incremental updates: onchange, update Compute Compute Compute Compute 6/29/

26 Programming Model Incremental updates: onchange, update Data Updated Compute Compute Compute Compute 6/29/

27 Programming Model Incremental updates: onchange, update Data Updated Compute Compute Compute Compute 6/29/

28 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ foreach(i,1:len, calculate(p=splits(p,i),m=splits(m,i), x=splits(p_old),z=splits(z,i)) { p (m*x)+z } ) } P_old P 6/29/

29 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ foreach(i,1:len, calculate(p=splits(p,i),m=splits(m,i), Create Distributed Array x=splits(p_old),z=splits(z,i)) { p (m*x)+z } ) } P_old P 6/29/

30 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z } ) } P_old P 6/29/

31 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ Execute function in a cluster foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z } ) } P_old P 6/29/

32 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ Execute function in a cluster foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z } Pass array partitions ) } P_old P 6/29/

33 Incremental PageRank N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s1)) onchange(m) { while(..){ foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z update(p) }) P_old P 6/29/ }}

34 Incremental PageRank N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s1)) onchange(m) { while(..){ foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z update(p) }) P_old P }} Execute when data changes 6/29/

35 Incremental PageRank N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s1)) onchange(m) { while(..){ foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z update(p) }) P_old P }} Execute when data changes Update page rank vector 6/29/

36 Outline Motivation Programming model Design Applications and Results 6/29/

37 Dynamic Partitioning of Matrices 6/29/

38 Dynamic Partitioning of Matrices Profile execution 6/29/

39 Dynamic Partitioning of Matrices Profile execution 6/29/

40 Dynamic Partitioning of Matrices Profile execution Partition 6/29/

41 Dynamic Partitioning of Matrices Profile execution Partition 6/29/

42 Dynamic Partitioning of Matrices Profile execution Partition 6/29/

43 Dynamic Partitioning of Matrices Profile execution Partition Programmers specify size invariants. 6/29/

44 Dynamic Partitioning of Matrices Up to 2x performance improvement 6/29/

45 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix R P Q S /29/

46 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix R P Q S onchange(m 1 ) 6/29/

47 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix Page Rank R P Q S update P 1 6/29/

48 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix Page Rank R P Q S /29/

49 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix Page Rank R P Q S onchange(m 2 ) /29/

50 Incremental Updates Using Consistent Snapshots Web Graph P R S Q Adjacency Matrix update Page Rank /29/

51 Versioned Distributed Arrays Mechanics of versioning update: Increment version number onchange: Bind a version number for the array before executing the handler 6/29/

52 Outline Motivation Programming model Design Applications and Results 6/29/

53 Applications Implemented in Presto Application Algorithm Presto LOC PageRank Eigenvector calculation 41 Triangle counting Top-K eigenvalues 121 Netflix recommendation Matrix factorization 130 Centrality measure Graph algorithm 132 k-path connectivity Graph algorithm 30 k-means Clustering 71 Sequence alignment Smith-Waterman 64 6/29/

54 Applications Implemented in Presto Application Algorithm Presto LOC PageRank Eigenvector calculation 41 Triangle counting Top-K eigenvalues 121 Fewer than 140 lines of code Netflix recommendation Matrix factorization 130 Centrality measure Graph algorithm 132 k-path connectivity Graph algorithm 30 k-means Clustering 71 Sequence alignment Smith-Waterman 64 6/29/

55 Time (sec) Presto is Fast! PageRank per-iteration execution time 1000 Presto Hadoop-InMem Number of workers Data: 100M nodes, 1.2B edges. Setup: 10G network. 12 cores, 96GB RAM. 6/29/

56 Time (sec) Presto is Fast! PageRank per-iteration execution time 1000 Presto Hadoop-InMem 100 More than 20x faster than Hadoop (w/ in-memory storage) Number of workers Data: 100M nodes, 1.2B edges. Setup: 10G network. 12 cores, 96GB RAM. 6/29/

57 More in the Paper Memory management, caching of partitions Scheduling operations Storage driver interface to HBase Fault tolerance 6/29/

58 Conclusion Linear Algebra is a powerful abstraction Easily express machine learning, graph algorithms Challenges: Sparse matrices, Incremental data Presto prototype extends R Open source version soon! 6/29/

High Performance Parallel Tucker Decomposition of Sparse Tensors

High Performance Parallel Tucker Decomposition of Sparse Tensors High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon,

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

Large-Scale Behavioral Targeting

Large-Scale Behavioral Targeting Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting

More information

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine

More information

A New Space for Comparing Graphs

A New Space for Comparing Graphs A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main

More information

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 Lecture 3 In which we show how to find a planted clique in a random graph. 1 Finding a Planted Clique We will analyze

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Multi-Approximate-Keyword Routing Query

Multi-Approximate-Keyword Routing Query Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is

More information

Facebook Friends! and Matrix Functions

Facebook Friends! and Matrix Functions Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Link Mining PageRank. From Stanford C246

Link Mining PageRank. From Stanford C246 Link Mining PageRank From Stanford C246 Broad Question: How to organize the Web? First try: Human curated Web dictionaries Yahoo, DMOZ LookSmart Second try: Web Search Information Retrieval investigates

More information

Link Analysis. Leonid E. Zhukov

Link Analysis. Leonid E. Zhukov Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

Complex Social System, Elections. Introduction to Network Analysis 1

Complex Social System, Elections. Introduction to Network Analysis 1 Complex Social System, Elections Introduction to Network Analysis 1 Complex Social System, Network I person A voted for B A is more central than B if more people voted for A In-degree centrality index

More information

Large-scale Collaborative Ranking in Near-Linear Time

Large-scale Collaborative Ranking in Near-Linear Time Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack

More information

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics Spectral Graph Theory and You: and Centrality Metrics Jonathan Gootenberg March 11, 2013 1 / 19 Outline of Topics 1 Motivation Basics of Spectral Graph Theory Understanding the characteristic polynomial

More information

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis CS-621 Theory Gems October 18, 2012 Lecture 10 Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis 1 Introduction In this lecture, we will see how one can use random walks to

More information

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011 Overview

More information

A Note on Google s PageRank

A Note on Google s PageRank A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Lecture 13 Spectral Graph Algorithms

Lecture 13 Spectral Graph Algorithms COMS 995-3: Advanced Algorithms March 6, 7 Lecture 3 Spectral Graph Algorithms Instructor: Alex Andoni Scribe: Srikar Varadaraj Introduction Today s topics: Finish proof from last lecture Example of random

More information

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second semester

More information

CS224W: Methods of Parallelized Kronecker Graph Generation

CS224W: Methods of Parallelized Kronecker Graph Generation CS224W: Methods of Parallelized Kronecker Graph Generation Sean Choi, Group 35 December 10th, 2012 1 Introduction The question of generating realistic graphs has always been a topic of huge interests.

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

Laplacian Matrices of Graphs: Spectral and Electrical Theory

Laplacian Matrices of Graphs: Spectral and Electrical Theory Laplacian Matrices of Graphs: Spectral and Electrical Theory Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale University Toronto, Sep. 28, 2 Outline Introduction to graphs

More information

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Link Analysis Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Parallel Matrix Factorization for Recommender Systems

Parallel Matrix Factorization for Recommender Systems Under consideration for publication in Knowledge and Information Systems Parallel Matrix Factorization for Recommender Systems Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, and Inderjit S. Dhillon Department of

More information

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University How works or How linear algebra powers the search engine M. Ram Murty, FRSC Queen s Research Chair Queen s University From: gomath.com/geometry/ellipse.php Metric mishap causes loss of Mars orbiter

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering

More information

Eigenvalue Problems Computation and Applications

Eigenvalue Problems Computation and Applications Eigenvalue ProblemsComputation and Applications p. 1/36 Eigenvalue Problems Computation and Applications Che-Rung Lee cherung@gmail.com National Tsing Hua University Eigenvalue ProblemsComputation and

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Spectral Clustering on Handwritten Digits Database

Spectral Clustering on Handwritten Digits Database University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:

More information

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive

More information

Approximating a single component of the solution to a linear system

Approximating a single component of the solution to a linear system Approximating a single component of the solution to a linear system Christina E. Lee, Asuman Ozdaglar, Devavrat Shah celee@mit.edu asuman@mit.edu devavrat@mit.edu MIT LIDS 1 How do I compare to my competitors?

More information

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise is the new name for ArcGIS for Server ArcGIS Enterprise Software Components ArcGIS Server Portal

More information

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1

More information

Matrix Factorization and Factorization Machines for Recommender Systems

Matrix Factorization and Factorization Machines for Recommender Systems Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 215 Chih-Jen Lin (National Taiwan Univ.) 1 / 54 Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

Spectral Graph Theory

Spectral Graph Theory Spectral Graph Theory Aaron Mishtal April 27, 2016 1 / 36 Outline Overview Linear Algebra Primer History Theory Applications Open Problems Homework Problems References 2 / 36 Outline Overview Linear Algebra

More information

Google Page Rank Project Linear Algebra Summer 2012

Google Page Rank Project Linear Algebra Summer 2012 Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant

More information

Ef#icient Processing of Large Graphs via Input Reduction

Ef#icient Processing of Large Graphs via Input Reduction Ef#icient Processing of Large Graphs via Input Reduction Amlan Kusum, Keval Vora, Rajiv Gupta, Iulian Neamtiu HPDC Kyoto, Japan 04 June, 0 Graph Processing Iterative graph algorithms Vertices are processed

More information

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms Professor Henry Carter Fall 2016 Recap Space-time tradeoffs allow for faster algorithms at the cost of space complexity overhead Dynamic

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

Implementation of a preconditioned eigensolver using Hypre

Implementation of a preconditioned eigensolver using Hypre Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1, and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes

More information

Sparse solver 64 bit and out-of-core addition

Sparse solver 64 bit and out-of-core addition Sparse solver 64 bit and out-of-core addition Prepared By: Richard Link Brian Yuen Martec Limited 1888 Brunswick Street, Suite 400 Halifax, Nova Scotia B3J 3J8 PWGSC Contract Number: W7707-145679 Contract

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Clustering based tensor decomposition

Clustering based tensor decomposition Clustering based tensor decomposition Huan He huan.he@emory.edu Shihua Wang shihua.wang@emory.edu Emory University November 29, 2017 (Huan)(Shihua) (Emory University) Clustering based tensor decomposition

More information

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties Prof. James She james.she@ust.hk 1 Last lecture 2 Selected works from Tutorial

More information

Conditioning of the Entries in the Stationary Vector of a Google-Type Matrix. Steve Kirkland University of Regina

Conditioning of the Entries in the Stationary Vector of a Google-Type Matrix. Steve Kirkland University of Regina Conditioning of the Entries in the Stationary Vector of a Google-Type Matrix Steve Kirkland University of Regina June 5, 2006 Motivation: Google s PageRank algorithm finds the stationary vector of a stochastic

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016 Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016 From Applied to Pure Mathematics Algebraic and Spectral Graph Theory Sparsification: approximating

More information

Project 2: Hadoop PageRank Cloud Computing Spring 2017

Project 2: Hadoop PageRank Cloud Computing Spring 2017 Project 2: Hadoop PageRank Cloud Computing Spring 2017 Professor Judy Qiu Goal This assignment provides an illustration of PageRank algorithms and Hadoop. You will then blend these applications by implementing

More information

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

The treatment of uncertainty in uniform workload distribution problems

The treatment of uncertainty in uniform workload distribution problems The treatment of uncertainty in uniform workload distribution problems tefan PE KO, Roman HAJTMANEK University of šilina, Slovakia 34 th International Conference Mathematical Methods in Economics Liberec,

More information

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions. CS 88: Artificial Intelligence Fall 27 Lecture 2: HMMs /6/27 Markov Models A Markov model is a chain-structured BN Each node is identically distributed (stationarity) Value of X at a given time is called

More information

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra. QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview

More information

A physical model for efficient rankings in networks

A physical model for efficient rankings in networks A physical model for efficient rankings in networks Daniel Larremore Assistant Professor Dept. of Computer Science & BioFrontiers Institute March 5, 2018 CompleNet danlarremore.com @danlarremore The idea

More information

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor

More information

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples Data Structures Introduction Andres Mendez-Vazquez December 3, 2015 1 / 53 Outline 1 What the Course is About? Data Manipulation Examples 2 What is a Good Algorithm? Sorting Example A Naive Algorithm Counting

More information

Graph Models The PageRank Algorithm

Graph Models The PageRank Algorithm Graph Models The PageRank Algorithm Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2013 The PageRank Algorithm I Invented by Larry Page and Sergey Brin around 1998 and

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Shang-Hua Teng Computer Science, Viterbi School of Engineering USC Massive Data and Massive Graphs 500 billions web

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

Diagonalization. MATH 322, Linear Algebra I. J. Robert Buchanan. Spring Department of Mathematics

Diagonalization. MATH 322, Linear Algebra I. J. Robert Buchanan. Spring Department of Mathematics Diagonalization MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 2015 Motivation Today we consider two fundamental questions: Given an n n matrix A, does there exist a basis

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges

More information

Stat 315c: Introduction

Stat 315c: Introduction Stat 315c: Introduction Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Stat 315c: Introduction 1 / 14 Stat 315c Analysis of Transposable Data Usual Statistics Setup there s Y (we ll

More information

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory ORIE 4741: Learning with Big Messy Data Spectral Graph Theory Mika Sumida Operations Research and Information Engineering Cornell September 15, 2017 1 / 32 Outline Graph Theory Spectral Graph Theory Laplacian

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Map Reduce I Map Reduce I 1 / 32 Outline 1. Introduction 2. Parallel

More information

Math 304 Handout: Linear algebra, graphs, and networks.

Math 304 Handout: Linear algebra, graphs, and networks. Math 30 Handout: Linear algebra, graphs, and networks. December, 006. GRAPHS AND ADJACENCY MATRICES. Definition. A graph is a collection of vertices connected by edges. A directed graph is a graph all

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

THE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS. Benjamin Recht UC Berkeley

THE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS. Benjamin Recht UC Berkeley THE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS Benjamin Recht UC Berkeley MY QUIXOTIC QUEST FOR SUPERLINEAR ALGORITHMS Benjamin Recht UC Berkeley Collaborators Slides extracted

More information

CS246: Mining Massive Data Sets Winter Only one late period is allowed for this homework (11:59pm 2/14). General Instructions

CS246: Mining Massive Data Sets Winter Only one late period is allowed for this homework (11:59pm 2/14). General Instructions CS246: Mining Massive Data Sets Winter 2017 Problem Set 2 Due 11:59pm February 9, 2017 Only one late period is allowed for this homework (11:59pm 2/14). General Instructions Submission instructions: These

More information

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System Yang You 1, James Demmel 1, Kent Czechowski 2, Le Song 2, Richard Vuduc 2 UC Berkeley 1, Georgia Tech 2 Yang You (Speaker) James

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information