Using R for Iterative and Incremental Processing
|
|
- Wilfred Edwards
- 5 years ago
- Views:
Transcription
1 Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY
2 Big Data, Complex Algorithms PageRank (Dominant eigenvector) Recommendations (Matrix factorization) Anomaly detection (Top-K eigenvalues) User Importance (Vertex Centrality) 6/29/2012 2
3 Big Data, Complex Algorithms PageRank (Dominant eigenvector) Machine learning + Graph algorithms Recommendations (Matrix factorization) Iterative Linear Algebra Operations Anomaly detection (Top-K eigenvalues) User Importance (Vertex Centrality) 6/29/2012 3
4 PageRank Using Matrices N s P 1 P 1 s P 1 P 1 N p M p Z Dominant eigenvector M = modified web graph matrix p = PageRank vector Simplified algorithm: repeat { p = M*p + Z} 6/29/2012 4
5 Breadth-first Search Using Matrices A B C D E X A B C D E A B C D G E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 5
6 Breadth-first Search Using Matrices A B D A B D C E C E X A B C D E * * * 0 0 A B C D E A B C D G E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 6
7 Breadth-first Search Using Matrices A B D A B D A B D C E C E C E X * * * 0 0 * * * * 0 A B C D E A B C D E A B C D E A B C D G E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 7
8 Breadth-first Search Using Matrices A B D A B D A B D A B D C E C E C E C E X * * * 0 0 * * * * 0 * * * * * A B C D E A B C D E A B C D E A B C D E A B C D G E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 8
9 Breadth-first Search Using Matrices * * * 0 0 A B C D E A B C D E Matrix operations A B C D * * * * 0 A B C D E * * * * * A B C D E Easy to express Efficient to implement E G = adjacency matrix X = BFS vector Simplified algorithm: repeat { X = G*X } 6/29/2012 9
10 Linear Algebra on Existing Frameworks Matrix Operations: Structured, coarse grained Need global state 6/29/
11 Linear Algebra on Existing Frameworks Matrix Operations: Structured, coarse grained Need global state Data-parallel frameworks MapReduce/Dryad Process each record in parallel Use case: Computing sufficient statistics 6/29/
12 Linear Algebra on Existing Frameworks Matrix Operations: Structured, coarse grained Need global state Data-parallel frameworks MapReduce/Dryad Process each record in parallel Use case: Computing sufficient statistics Graph-centric frameworks Pregel/GraphLab Process each vertex in parallel Use case: Graph models 6/29/
13 Challenge 1 Sparse Matrices 6/29/
14 Challenge 1 Sparse Matrices 6/29/
15 Challenge 1 Sparse Matrices 6/29/
16 Block density (normalized ) Challenge 1 Sparse Matrices LiveJournal Netflix ClueWeb-1B Block ID 6/29/
17 Block density (normalized ) Challenge 1 Sparse Matrices LiveJournal Netflix ClueWeb-1B Block ID 6/29/
18 Block density (normalized ) Challenge 1 Sparse Matrices LiveJournal Netflix ClueWeb-1B x more elements Computation imbalance Block ID 6/29/
19 Challenge 2 Incremental Updates New movie ratings Refine recommendations Better suggestions Incremental computation on consistent view of data 6/29/
20 Presto Framework for large-scale iterative linear algebra Extend R for scalability and incremental updates 6/29/
21 Outline Motivation Programming model Design Applications and Results 6/29/
22 Programming Model One data structure: Distributed Array A darray() 6/29/
23 Programming Model Iteration: foreach 6/29/
24 Programming Model Iteration: foreach Compute Compute Compute Compute 6/29/
25 Programming Model Incremental updates: onchange, update Compute Compute Compute Compute 6/29/
26 Programming Model Incremental updates: onchange, update Data Updated Compute Compute Compute Compute 6/29/
27 Programming Model Incremental updates: onchange, update Data Updated Compute Compute Compute Compute 6/29/
28 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ foreach(i,1:len, calculate(p=splits(p,i),m=splits(m,i), x=splits(p_old),z=splits(z,i)) { p (m*x)+z } ) } P_old P 6/29/
29 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ foreach(i,1:len, calculate(p=splits(p,i),m=splits(m,i), Create Distributed Array x=splits(p_old),z=splits(z,i)) { p (m*x)+z } ) } P_old P 6/29/
30 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z } ) } P_old P 6/29/
31 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ Execute function in a cluster foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z } ) } P_old P 6/29/
32 PageRank Using Presto N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s,1)) while(..){ Execute function in a cluster foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z } Pass array partitions ) } P_old P 6/29/
33 Incremental PageRank N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s1)) onchange(m) { while(..){ foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z update(p) }) P_old P 6/29/ }}
34 Incremental PageRank N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s1)) onchange(m) { while(..){ foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z update(p) }) P_old P }} Execute when data changes 6/29/
35 Incremental PageRank N s P 1 P 1 s P 1 P 1 N P M P_old Z M darray(dim=c(n,n),blocks=(s,n)) P darray(dim=c(n,1),blocks=(s1)) onchange(m) { while(..){ foreach(i,1:len, calculate(p=splits(p,i), m=splits(m,i), x=splits(p_old), z=splits(z,i)) { p (m*x)+z update(p) }) P_old P }} Execute when data changes Update page rank vector 6/29/
36 Outline Motivation Programming model Design Applications and Results 6/29/
37 Dynamic Partitioning of Matrices 6/29/
38 Dynamic Partitioning of Matrices Profile execution 6/29/
39 Dynamic Partitioning of Matrices Profile execution 6/29/
40 Dynamic Partitioning of Matrices Profile execution Partition 6/29/
41 Dynamic Partitioning of Matrices Profile execution Partition 6/29/
42 Dynamic Partitioning of Matrices Profile execution Partition 6/29/
43 Dynamic Partitioning of Matrices Profile execution Partition Programmers specify size invariants. 6/29/
44 Dynamic Partitioning of Matrices Up to 2x performance improvement 6/29/
45 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix R P Q S /29/
46 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix R P Q S onchange(m 1 ) 6/29/
47 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix Page Rank R P Q S update P 1 6/29/
48 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix Page Rank R P Q S /29/
49 Incremental Updates Using Consistent Snapshots Web Graph Adjacency Matrix Page Rank R P Q S onchange(m 2 ) /29/
50 Incremental Updates Using Consistent Snapshots Web Graph P R S Q Adjacency Matrix update Page Rank /29/
51 Versioned Distributed Arrays Mechanics of versioning update: Increment version number onchange: Bind a version number for the array before executing the handler 6/29/
52 Outline Motivation Programming model Design Applications and Results 6/29/
53 Applications Implemented in Presto Application Algorithm Presto LOC PageRank Eigenvector calculation 41 Triangle counting Top-K eigenvalues 121 Netflix recommendation Matrix factorization 130 Centrality measure Graph algorithm 132 k-path connectivity Graph algorithm 30 k-means Clustering 71 Sequence alignment Smith-Waterman 64 6/29/
54 Applications Implemented in Presto Application Algorithm Presto LOC PageRank Eigenvector calculation 41 Triangle counting Top-K eigenvalues 121 Fewer than 140 lines of code Netflix recommendation Matrix factorization 130 Centrality measure Graph algorithm 132 k-path connectivity Graph algorithm 30 k-means Clustering 71 Sequence alignment Smith-Waterman 64 6/29/
55 Time (sec) Presto is Fast! PageRank per-iteration execution time 1000 Presto Hadoop-InMem Number of workers Data: 100M nodes, 1.2B edges. Setup: 10G network. 12 cores, 96GB RAM. 6/29/
56 Time (sec) Presto is Fast! PageRank per-iteration execution time 1000 Presto Hadoop-InMem 100 More than 20x faster than Hadoop (w/ in-memory storage) Number of workers Data: 100M nodes, 1.2B edges. Setup: 10G network. 12 cores, 96GB RAM. 6/29/
57 More in the Paper Memory management, caching of partitions Scheduling operations Storage driver interface to HBase Fault tolerance 6/29/
58 Conclusion Linear Algebra is a powerful abstraction Easily express machine learning, graph algorithms Challenges: Sparse matrices, Incremental data Presto prototype extends R Open source version soon! 6/29/
High Performance Parallel Tucker Decomposition of Sparse Tensors
High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon,
More informationLab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018
Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google
More informationLarge-Scale Behavioral Targeting
Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting
More informationScalable Asynchronous Gradient Descent Optimization for Out-of-Core Models
Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine
More informationA New Space for Comparing Graphs
A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main
More informationCommunity Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria
Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationU.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018
U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 Lecture 3 In which we show how to find a planted clique in a random graph. 1 Finding a Planted Clique We will analyze
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationBig & Quic: Sparse Inverse Covariance Estimation for a Million Variables
for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationLecture 13: Spectral Graph Theory
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationMulti-Approximate-Keyword Routing Query
Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction
More informationBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?
More informationDot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics
Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is
More informationFacebook Friends! and Matrix Functions
Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra
More informationCS246 Final Exam, Winter 2011
CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including
More informationLecture: Local Spectral Methods (1 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationLink Mining PageRank. From Stanford C246
Link Mining PageRank From Stanford C246 Broad Question: How to organize the Web? First try: Human curated Web dictionaries Yahoo, DMOZ LookSmart Second try: Web Search Information Retrieval investigates
More informationLink Analysis. Leonid E. Zhukov
Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationParallel Transposition of Sparse Data Structures
Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing
More informationComplex Social System, Elections. Introduction to Network Analysis 1
Complex Social System, Elections Introduction to Network Analysis 1 Complex Social System, Network I person A voted for B A is more central than B if more people voted for A In-degree centrality index
More informationLarge-scale Collaborative Ranking in Near-Linear Time
Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack
More informationLecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure
Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationIntroduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa
Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis
More informationSpectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics
Spectral Graph Theory and You: and Centrality Metrics Jonathan Gootenberg March 11, 2013 1 / 19 Outline of Topics 1 Motivation Basics of Spectral Graph Theory Understanding the characteristic polynomial
More informationLecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis
CS-621 Theory Gems October 18, 2012 Lecture 10 Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis 1 Introduction In this lecture, we will see how one can use random walks to
More informationQuilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs
Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011 Overview
More informationA Note on Google s PageRank
A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationLecture 13 Spectral Graph Algorithms
COMS 995-3: Advanced Algorithms March 6, 7 Lecture 3 Spectral Graph Algorithms Instructor: Alex Andoni Scribe: Srikar Varadaraj Introduction Today s topics: Finish proof from last lecture Example of random
More informationMapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second semester
More informationCS224W: Methods of Parallelized Kronecker Graph Generation
CS224W: Methods of Parallelized Kronecker Graph Generation Sean Choi, Group 35 December 10th, 2012 1 Introduction The question of generating realistic graphs has always been a topic of huge interests.
More informationSpectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity
Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing
More informationLaplacian Matrices of Graphs: Spectral and Electrical Theory
Laplacian Matrices of Graphs: Spectral and Electrical Theory Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale University Toronto, Sep. 28, 2 Outline Introduction to graphs
More informationLink Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Link Analysis Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationParallel Matrix Factorization for Recommender Systems
Under consideration for publication in Knowledge and Information Systems Parallel Matrix Factorization for Recommender Systems Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, and Inderjit S. Dhillon Department of
More informationHow works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University
How works or How linear algebra powers the search engine M. Ram Murty, FRSC Queen s Research Chair Queen s University From: gomath.com/geometry/ellipse.php Metric mishap causes loss of Mars orbiter
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering
More informationEigenvalue Problems Computation and Applications
Eigenvalue ProblemsComputation and Applications p. 1/36 Eigenvalue Problems Computation and Applications Che-Rung Lee cherung@gmail.com National Tsing Hua University Eigenvalue ProblemsComputation and
More informationA robust multilevel approximate inverse preconditioner for symmetric positive definite matrices
DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationSpectral Clustering on Handwritten Digits Database
University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:
More informationGoogle PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano
Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University.
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive
More informationApproximating a single component of the solution to a linear system
Approximating a single component of the solution to a linear system Christina E. Lee, Asuman Ozdaglar, Devavrat Shah celee@mit.edu asuman@mit.edu devavrat@mit.edu MIT LIDS 1 How do I compare to my competitors?
More informationArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde
ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise is the new name for ArcGIS for Server ArcGIS Enterprise Software Components ArcGIS Server Portal
More informationLarge-Scale Matrix Factorization with Distributed Stochastic Gradient Descent
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1
More informationMatrix Factorization and Factorization Machines for Recommender Systems
Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 215 Chih-Jen Lin (National Taiwan Univ.) 1 / 54 Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve
More informationSpectral Graph Theory
Spectral Graph Theory Aaron Mishtal April 27, 2016 1 / 36 Outline Overview Linear Algebra Primer History Theory Applications Open Problems Homework Problems References 2 / 36 Outline Overview Linear Algebra
More informationGoogle Page Rank Project Linear Algebra Summer 2012
Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant
More informationEf#icient Processing of Large Graphs via Input Reduction
Ef#icient Processing of Large Graphs via Input Reduction Amlan Kusum, Keval Vora, Rajiv Gupta, Iulian Neamtiu HPDC Kyoto, Japan 04 June, 0 Graph Processing Iterative graph algorithms Vertices are processed
More informationCSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms
CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms Professor Henry Carter Fall 2016 Recap Space-time tradeoffs allow for faster algorithms at the cost of space complexity overhead Dynamic
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com
More informationImplementation of a preconditioned eigensolver using Hypre
Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1, and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes
More informationSparse solver 64 bit and out-of-core addition
Sparse solver 64 bit and out-of-core addition Prepared By: Richard Link Brian Yuen Martec Limited 1888 Brunswick Street, Suite 400 Halifax, Nova Scotia B3J 3J8 PWGSC Contract Number: W7707-145679 Contract
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationMapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationClustering based tensor decomposition
Clustering based tensor decomposition Huan He huan.he@emory.edu Shihua Wang shihua.wang@emory.edu Emory University November 29, 2017 (Huan)(Shihua) (Emory University) Clustering based tensor decomposition
More informationELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties
ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties Prof. James She james.she@ust.hk 1 Last lecture 2 Selected works from Tutorial
More informationConditioning of the Entries in the Stationary Vector of a Google-Type Matrix. Steve Kirkland University of Regina
Conditioning of the Entries in the Stationary Vector of a Google-Type Matrix Steve Kirkland University of Regina June 5, 2006 Motivation: Google s PageRank algorithm finds the stationary vector of a stochastic
More informationReview: From problem to parallel algorithm
Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:
More informationGraphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016 From Applied to Pure Mathematics Algebraic and Spectral Graph Theory Sparsification: approximating
More informationProject 2: Hadoop PageRank Cloud Computing Spring 2017
Project 2: Hadoop PageRank Cloud Computing Spring 2017 Professor Judy Qiu Goal This assignment provides an illustration of PageRank algorithms and Hadoop. You will then blend these applications by implementing
More informationAn Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors
Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne
More informationData Mining Recitation Notes Week 3
Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents
More informationEfficient algorithms for symmetric tensor contractions
Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to
More informationThe treatment of uncertainty in uniform workload distribution problems
The treatment of uncertainty in uniform workload distribution problems tefan PE KO, Roman HAJTMANEK University of šilina, Slovakia 34 th International Conference Mathematical Methods in Economics Liberec,
More informationMarkov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.
CS 88: Artificial Intelligence Fall 27 Lecture 2: HMMs /6/27 Markov Models A Markov model is a chain-structured BN Each node is identically distributed (stationarity) Value of X at a given time is called
More informationQALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.
QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview
More informationA physical model for efficient rankings in networks
A physical model for efficient rankings in networks Daniel Larremore Assistant Professor Dept. of Computer Science & BioFrontiers Institute March 5, 2018 CompleNet danlarremore.com @danlarremore The idea
More informationLink Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci
Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor
More informationData Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples
Data Structures Introduction Andres Mendez-Vazquez December 3, 2015 1 / 53 Outline 1 What the Course is About? Data Manipulation Examples 2 What is a Good Algorithm? Sorting Example A Naive Algorithm Counting
More informationGraph Models The PageRank Algorithm
Graph Models The PageRank Algorithm Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2013 The PageRank Algorithm I Invented by Larry Page and Sergey Brin around 1998 and
More informationDATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS
DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information
More informationAlgorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm
Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Shang-Hua Teng Computer Science, Viterbi School of Engineering USC Massive Data and Massive Graphs 500 billions web
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationDiagonalization. MATH 322, Linear Algebra I. J. Robert Buchanan. Spring Department of Mathematics
Diagonalization MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 2015 Motivation Today we consider two fundamental questions: Given an n n matrix A, does there exist a basis
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore
More informationECEN 689 Special Topics in Data Science for Communications Networks
ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges
More informationStat 315c: Introduction
Stat 315c: Introduction Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Stat 315c: Introduction 1 / 14 Stat 315c Analysis of Transposable Data Usual Statistics Setup there s Y (we ll
More informationORIE 4741: Learning with Big Messy Data. Spectral Graph Theory
ORIE 4741: Learning with Big Messy Data Spectral Graph Theory Mika Sumida Operations Research and Information Engineering Cornell September 15, 2017 1 / 32 Outline Graph Theory Spectral Graph Theory Laplacian
More informationBig Data Analytics. Lucas Rego Drumond
Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Map Reduce I Map Reduce I 1 / 32 Outline 1. Introduction 2. Parallel
More informationMath 304 Handout: Linear algebra, graphs, and networks.
Math 30 Handout: Linear algebra, graphs, and networks. December, 006. GRAPHS AND ADJACENCY MATRICES. Definition. A graph is a collection of vertices connected by edges. A directed graph is a graph all
More informationDistributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract
More informationTHE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS. Benjamin Recht UC Berkeley
THE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS Benjamin Recht UC Berkeley MY QUIXOTIC QUEST FOR SUPERLINEAR ALGORITHMS Benjamin Recht UC Berkeley Collaborators Slides extracted
More informationCS246: Mining Massive Data Sets Winter Only one late period is allowed for this homework (11:59pm 2/14). General Instructions
CS246: Mining Massive Data Sets Winter 2017 Problem Set 2 Due 11:59pm February 9, 2017 Only one late period is allowed for this homework (11:59pm 2/14). General Instructions Submission instructions: These
More informationCA-SVM: Communication-Avoiding Support Vector Machines on Distributed System
CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System Yang You 1, James Demmel 1, Kent Czechowski 2, Le Song 2, Richard Vuduc 2 UC Berkeley 1, Georgia Tech 2 Yang You (Speaker) James
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More information