Large Graph Mining: Power Tools and a Practitioner s guide
|
|
- Elwin Perry
- 5 years ago
- Views:
Transcription
1 Large Graph Mining: Power Tools and a Practitioner s guide Task 5: Graphs over time & tensors Faloutsos, Miller, Tsourakakis CMU KDD '9 Faloutsos, Miller, Tsourakakis P5-1
2 Outline Introduction Motivation Task 1: Node importance Task 2: Community detection Task 3: Recommendations Task 4: Connection sub-graphs Task 5: Mining graphs over time Task 6: Virus/influence propagation Task 7: Spectral graph theory Task 8: Tera/peta graph mining: hadoop Observations patterns of real graphs Conclusions KDD '9 Faloutsos, Miller, Tsourakakis P5-2
3 Tamara Kolda (Sandia) Thanks to for the foils on tensor definitions, and on TOPHITS KDD '9 Faloutsos, Miller, Tsourakakis P5-3
4 Detailed outline Motivation Definitions: PARAFAC and Tucker Case study: web mining KDD '9 Faloutsos, Miller, Tsourakakis P5-4
5 Examples of Matrices: Authors and terms John Peter Mary Nick... data mining classif. tree KDD '9 Faloutsos, Miller, Tsourakakis P5-5
6 Motivation: Why tensors? Q: what is a tensor? KDD '9 Faloutsos, Miller, Tsourakakis P5-6
7 Motivation: Why tensors? A: N-D generalization of matrix: KDD 9 John Peter Mary Nick... data mining classif. tree KDD '9 Faloutsos, Miller, Tsourakakis P5-7
8 Motivation: Why tensors? A: N-D generalization of matrix: KDD 7 John Peter Mary Nick... KDD 8 KDD 9 data mining classif. tree KDD '9 Faloutsos, Miller, Tsourakakis P5-8
9 Tensors are useful for 3 or more modes Terminology: mode (or aspect ): Mode#3 data mining classif. tree... Mode# Mode (== aspect) #1 KDD '9 Faloutsos, Miller, Tsourakakis P5-9
10 Notice 3 rd mode does not need to be time we can have more than 3 modes Dest. port IP source IP destination KDD '9 Faloutsos, Miller, Tsourakakis P5-1
11 Notice 3 rd mode does not need to be time we can have more than 3 modes Eg, ffmri: x,y,z, time, person-id, task-id From DENLAB, Temple U. (Prof. V. Megalooikonomou +) KDD '9 Faloutsos, Miller, Tsourakakis P5-11
12 Motivating Applications Why tensors are useful? web mining (TOPHITS) environmental sensors Intrusion detection (src, dst, time, dest-port) Social networks (src, dst, time, type-of-contact) face recognition etc KDD '9 Faloutsos, Miller, Tsourakakis P5-12
13 Detailed outline Motivation Definitions: PARAFAC and Tucker Case study: web mining KDD '9 Faloutsos, Miller, Tsourakakis P5-13
14 Tensor basics Multi-mode extensions of SVD recall that: KDD '9 Faloutsos, Miller, Tsourakakis P5-14
15 Reminder: SVD n n m A m Σ V T U Best rank-k approximation in L2 KDD '9 Faloutsos, Miller, Tsourakakis P5-15
16 Reminder: SVD n σ 1 u 1 v 1 σ 2 u 2 v 2 m A + Best rank-k approximation in L2 KDD '9 Faloutsos, Miller, Tsourakakis P5-16
17 Goal: extension to >=3 modes I x J x K I x R K x R C J x R ¼ B = + + A R x R x R KDD '9 Faloutsos, Miller, Tsourakakis P5-17
18 Main points: 2 major types of tensor decompositions: PARAFAC and Tucker both can be solved with ``alternating least squares (ALS) KDD '9 Faloutsos, Miller, Tsourakakis P5-18
19 Specially Structured Tensors Tucker Tensor Kruskal Tensor Our Notation Our Notation core W K x T W K x R I x J x K I x R J x S I x J x K w 1 I x R w R J x R = U R x S x T V = = u 1 v 1 U + + V R x R x R u R v R KDD '9 Faloutsos, Miller, Tsourakakis P5-19
20 Tucker Decomposition - intuition K x T C I x J x K I x R J x S ¼ A R x S x T B author x keyword x conference A: author x author-group B: keyword x keyword-group C: conf. x conf-group G: how groups relate to each other KDD '9 Faloutsos, Miller, Tsourakakis P5-2
21 Intuition behind core tensor 2-d case: co-clustering [Dhillon et al. Information-Theoretic Coclustering, KDD 3] KDD '9 Faloutsos, Miller, Tsourakakis P5-21
22 KDD '9 Faloutsos, Miller, Tsourakakis P5-22 CMU SCS [ ] = m m n n l k k l eg, terms x documents
23 med. doc cs doc term group x doc. group med. terms cs terms common terms term x term-group [ ] doc x doc group KDD '9 Faloutsos, Miller, Tsourakakis P5-23 =
24 Two main tools PARAFAC Tucker Tensor tools - summary Both find row-, column-, tube-groups but in PARAFAC the three groups are identical ( To solve: Alternating Least Squares ) KDD '9 Faloutsos, Miller, Tsourakakis P5-24
25 Detailed outline Motivation Definitions: PARAFAC and Tucker Case study: web mining KDD '9 Faloutsos, Miller, Tsourakakis P5-25
26 Web graph mining How to order the importance of web pages? Kleinberg s algorithm HITS PageRank Tensor extension on HITS (TOPHITS) KDD '9 Faloutsos, Miller, Tsourakakis P5-26
27 Kleinberg s Hubs and Authorities (the HITS method) Sparse adjacency matrix and its SVD: authority scores for 1 st topic authority scores for 2 nd topic to from hub scores for 1 st topic hub scores for 2 nd topic KDD '9 Faloutsos, Miller, Tsourakakis P5-27 Kleinberg, JACM, 1999
28 from CMU SCS authority scores for 1 st topic to HITS Authorities on Sample Data 1st Principal Factor nd Principal Factor.8 www-128.ibm.com.99 We started our crawl from.5 www2.lehigh.edu 3rd Principal Factor.6 and crawled 47 pages,.1 java.sun.com.6 resulting in 56.1 news.com.com cross-linked hosts..36 developers.sun.com.2 4th Principal Factor.24 see.sun.com.2 lewisweb.cc.lehigh.edu docs.sun.com blogs.sun.com.2 fp1.cc.lehigh.edu.31 travel.state.gov 6th Principal Factor.8 sunsolve.sun.com.22 mathpost.asu.edu math.la.asu.edu.8 news.com.com authority scores for 2 nd topic archives.math.utk.edu hub scores KDD '9 Faloutsos, Miller, Tsourakakis for 1 st hub scores topic P5-28 for 2 nd topic
29 Three-Dimensional View of the Web Observe that this tensor is very sparse! KDD '9 Faloutsos, Miller, Tsourakakis P5-29 Kolda, Bader, Kenny, ICDM5
30 Three-Dimensional View of the Web Observe that this tensor is very sparse! KDD '9 Faloutsos, Miller, Tsourakakis P5-3 Kolda, Bader, Kenny, ICDM5
31 Three-Dimensional View of the Web Observe that this tensor is very sparse! KDD '9 Faloutsos, Miller, Tsourakakis P5-31 Kolda, Bader, Kenny, ICDM5
32 Topical HITS (TOPHITS) Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information. term term scores for 1 st topic term scores for 2 nd topic to from hub scores for 1 st topic authority scores for 1 st topic authority scores for 2 nd topic hub scores for 2 nd topic KDD '9 Faloutsos, Miller, Tsourakakis P5-32
33 Topical HITS (TOPHITS) Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information. term term scores for 1 st topic term scores for 2 nd topic to from hub scores for 1 st topic authority scores for 1 st topic authority scores for 2 nd topic hub scores for 2 nd topic KDD '9 Faloutsos, Miller, Tsourakakis P5-33
34 TOPHITS Terms & Authorities on Sample Data 1st Principal Factor.23 JAVA.86 java.sun.com.18 SUN.38 developers.sun.com 2nd Principal Factor.17 PLATFORM.16 docs.sun.com.2 NO-READABLE-TEXT.99 TOPHITS uses 3D analysis to find.16 SOLARIS.14 see.sun.com.16 FACULTY.6.16 DEVELOPER rd www2.lehigh.edu Principal Factor the dominant groupings of web.16 SEARCH.15 EDITION.15 NO-READABLE-TEXT pages and terms..16 NEWS.15 DOWNLOAD.15 IBM.7 developer.sun.com.18 LIBRARIES.14 INFO.12 SERVICES.6 sunsolve.sun.com.7 4th www-128.ibm.com Principal Factor.16 COMPUTING.12 SOFTWARE.12 WEBSPHERE.5 access1.sun.com.26 INFORMATION LEHIGH.12 NO-READABLE-TEXT.12 WEB.5 iforce.sun.com.24 FEDERAL DEVELOPERWORKS.23 CITIZEN.1 6th Principal Factor w.11 LINUX.22 OTHER.19 travel.state.gov k = # unique links using term k.26 PRESIDENT.87 RESOURCES.19 CENTER.25 NO-READABLE-TEXT TECHNOLOGIES.19 LANGUAGES.25 BUSH.9 travel.state.gov.1 DOWNLOADS.15 U.S.25 WELCOME th Principal Factor.1 PUBLICATIONS.75 OPTIMIZATION.17 WHITE CONSUMER.58 SOFTWARE.16 U.S FREE.8 DECISION th plato.la.asu.edu Principal Factor.15 HOUSE.7 NEOS.46 ADOBE BUDGET.6 TREE.45 READER PRESIDENTS.5 GUIDE.45 ACROBAT th Principal Factor.11 OFFICE.2 SEARCH.3 FREE.5 WEATHER Tensor PARAFAC.5 ENGINE.3 NO-READABLE-TEXT.24 OFFICE.25 www-fp.mcs.anl.gov.41 CONTROL.29 HERE.23 CENTER.22 lwf.ncdc.noaa.gov.5 ILOG.29 COPY.19 NO-READABLE-TEXT th Principal Factor term scores term scores for 1 st topic for 2.5 DOWNLOAD.17 ORGANIZATION.22 TAX nd topic.15 NWS.17 TAXES.9 travel.state.gov to.15 SEVERE.15 CHILD.7 aviationweather.gov.22 FIRE.15 RETIREMENT authority scores authority scores.15 POLICY.14 BENEFITS for 1 st topic for 2 nd topic.14 CLIMATE.14 STATE.3 hub scores.14 INCOME.3 for 2 KDD '9 hub scores nd topic Faloutsos, Miller, Tsourakakis.13 SERVICE.2 for 1 P5-34 st topic.13 REVENUE.2 CREDIT.1 from term
35 Conclusions Real data are often in high dimensions with multiple aspects (modes) Tensors provide elegant theory and algorithms PARAFAC and Tucker: discover groups KDD '9 Faloutsos, Miller, Tsourakakis P5-35
36 References T. G. Kolda, B. W. Bader and J. P. Kenny. Higher-Order Web Link Analysis Using Multilinear Algebra. In: ICDM 25, Pages , November 25. Jimeng Sun, Spiros Papadimitriou, Philip Yu. Window-based Tensor Analysis on Highdimensional and Multi-aspect Streams, Proc. of the Int. Conf. on Data Mining (ICDM), Hong Kong, China, Dec 26 KDD '9 Faloutsos, Miller, Tsourakakis P5-36
37 Resources See tutorial on tensors, KDD 7 (w/ Tamara Kolda and Jimeng Sun): KDD '9 Faloutsos, Miller, Tsourakakis P5-37
38 Tensor tools - resources Toolbox: from Tamara Kolda: csmr.ca.sandia.gov/~tgkolda/tensortoolbox T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications. SIAM Review, Volume 51, Number 3, September 29 csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/tensorreview-preprint.pdf T. Kolda KDD ICDE 9 '9and J. Sun: Scalable Copyright: Faloutsos, Tensor Faloutsos, Miller, Tsourakakis Tong Decomposition (29) for Multi-Aspect P Data Mining (ICDM 28)
Faloutsos, Kolda, Sun
Faloutsos, Kolda, Sun SDM'7 About the tutorial Mining Large Time-evolving Data Using Matrix and Tensor Tools Christos Faloutsos Carnegie Mellon Univ. Tamara G. Kolda Sandia National Labs Jimeng Sun Carnegie
More informationMust-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos
Must-read Material 15-826: Multimedia Databases and Data Mining Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. Technical Report SAND2007-6702, Sandia National Laboratories,
More informationHigher-Order Web Link Analysis Using Multilinear Algebra
Higher-Order Web Link Analysis Using Multilinear Algebra Tamara G. Kolda, Brett W. Bader, and Joseph P. Kenny Sandia National Laboratories Livermore, CA and Albuquerque, NM {tgkolda,bwbader,jpkenny}@sandia.gov
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationCMU SCS Talk 3: Graph Mining Tools Tensors, communities, parallelism
Talk 3: Graph Mining Tools Tensors, communities, parallelism Christos Faloutsos CMU Overall Outline Introduction Motivation Talk#1: Patterns in graphs; generators Talk#2: Tools (Ranking, proximity) Talk#3:
More informationWindow-based Tensor Analysis on High-dimensional and Multi-aspect Streams
Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,
More informationLarge Graph Mining: Power Tools and a Practitioner s guide
Large Graph Mining: Power Tools and a Practitioner s guide Task 3: Recommendations & proximity Faloutsos, Miller & Tsourakakis CMU KDD'09 Faloutsos, Miller, Tsourakakis P3-1 Outline Introduction Motivation
More informationHub, Authority and Relevance Scores in Multi-Relational Data for Query Search
Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li 1 Michael Ng 2 Yunming Ye 1 1 Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology,
More informationTensor Analysis. Topics in Data Mining Fall Bruno Ribeiro
Tensor Analysis Topics in Data Mining Fall 2015 Bruno Ribeiro Tensor Basics But First 2 Mining Matrices 3 Singular Value Decomposition (SVD) } X(i,j) = value of user i for property j i 2 j 5 X(Alice, cholesterol)
More informationFitting a Tensor Decomposition is a Nonlinear Optimization Problem
Fitting a Tensor Decomposition is a Nonlinear Optimization Problem Evrim Acar, Daniel M. Dunlavy, and Tamara G. Kolda* Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia
More informationLink Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Link Analysis Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages
More informationThanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides
Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank
More informationLink Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci
Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor
More informationThanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides
Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank
More informationTwo heads better than one: pattern discovery in time-evolving multi-aspect data
Data Min Knowl Disc (2008) 17:111 128 DOI 10.1007/s10618-008-0112-3 Two heads better than one: pattern discovery in time-evolving multi-aspect data Jimeng Sun Charalampos E. Tsourakakis Evan Hoke Christos
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationDATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS
DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationTwo heads better than one: Pattern Discovery in Time-evolving Multi-Aspect Data
Two heads better than one: Pattern Discovery in Time-evolving Multi-Aspect Data Jimeng Sun 1, Charalampos E. Tsourakakis 2, Evan Hoke 4, Christos Faloutsos 2, and Tina Eliassi-Rad 3 1 IBM T.J. Watson Research
More informationTalk 2: Graph Mining Tools - SVD, ranking, proximity. Christos Faloutsos CMU
Talk 2: Graph Mining Tools - SVD, ranking, proximity Christos Faloutsos CMU Outline Introduction Motivation Task 1: Node importance Task 2: Recommendations Task 3: Connection sub-graphs Conclusions Lipari
More informationAnomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach
Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach Anna
More informationHigh Performance Parallel Tucker Decomposition of Sparse Tensors
High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon,
More informationPostgraduate Course Signal Processing for Big Data (MSc)
Postgraduate Course Signal Processing for Big Data (MSc) Jesús Gustavo Cuevas del Río E-mail: gustavo.cuevas@upm.es Work Phone: +34 91 549 57 00 Ext: 4039 Course Description Instructor Information Course
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationCMU SCS. Large Graph Mining. Christos Faloutsos CMU
Large Graph Mining Christos Faloutsos CMU Thank you! Hillol Kargupta NGDM 2007 C. Faloutsos 2 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; fraud
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationLINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
LINK ANALYSIS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link analysis Models
More informationCS246 Final Exam, Winter 2011
CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart
More informationHyperlinked-Induced Topic Search (HITS) identifies. authorities as good content sources (~high indegree) HITS [Kleinberg 99] considers a web page
IV.3 HITS Hyperlinked-Induced Topic Search (HITS) identifies authorities as good content sources (~high indegree) hubs as good link sources (~high outdegree) HITS [Kleinberg 99] considers a web page a
More informationMatrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY
Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY OUTLINE Why We Need Matrix Decomposition SVD (Singular Value Decomposition) NMF (Nonnegative
More informationLecture 12: Link Analysis for Web Retrieval
Lecture 12: Link Analysis for Web Retrieval Trevor Cohn COMP90042, 2015, Semester 1 What we ll learn in this lecture The web as a graph Page-rank method for deriving the importance of pages Hubs and authorities
More informationAs it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation
Graphs and Networks Page 1 Lecture 2, Ranking 1 Tuesday, September 12, 2006 1:14 PM I. II. I. How search engines work: a. Crawl the web, creating a database b. Answer query somehow, e.g. grep. (ex. Funk
More informationParallel Tensor Compression for Large-Scale Scientific Data
Parallel Tensor Compression for Large-Scale Scientific Data Woody Austin, Grey Ballard, Tamara G. Kolda April 14, 2016 SIAM Conference on Parallel Processing for Scientific Computing MS 44/52: Parallel
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal
More informationBruce Hendrickson Discrete Algorithms & Math Dept. Sandia National Labs Albuquerque, New Mexico Also, CS Department, UNM
Latent Semantic Analysis and Fiedler Retrieval Bruce Hendrickson Discrete Algorithms & Math Dept. Sandia National Labs Albuquerque, New Mexico Also, CS Department, UNM Informatics & Linear Algebra Eigenvectors
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering
More informationMultiVis: Content-based Social Network Exploration Through Multi-way Visual Analysis
MultiVis: Content-based Social Network Exploration Through Multi-way Visual Analysis Jimeng Sun, Spiros Papadimitriou, Ching-Yung Lin, Nan Cao, Shixia Liu, Weihong Qian Abstract With the explosion of social
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationA Three-Way Model for Collective Learning on Multi-Relational Data
A Three-Way Model for Collective Learning on Multi-Relational Data 28th International Conference on Machine Learning Maximilian Nickel 1 Volker Tresp 2 Hans-Peter Kriegel 1 1 Ludwig-Maximilians Universität,
More informationCS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya
CS 375 Advanced Machine Learning Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya Outline SVD and LSI Kleinberg s Algorithm PageRank Algorithm Vector Space Model Vector space model represents
More informationWhat is this Page Known for? Computing Web Page Reputations. Outline
What is this Page Known for? Computing Web Page Reputations Davood Rafiei University of Alberta http://www.cs.ualberta.ca/~drafiei Joint work with Alberto Mendelzon (U. of Toronto) 1 Outline Scenarios
More informationMultiRank: Co-Ranking for Objects and Relations in Multi-Relational Data
MultiRank: Co-Ranking for Objects and Relations in Multi-Relational Data ABSTRACT Michael K. Ng Department of Mathematics Hong Kong Baptist University Kowloon Tong Hong Kong mng@math.hkbu.edu.hk The main
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com
More informationON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES
ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES JIŘÍ ŠÍMA AND SATU ELISA SCHAEFFER Academy of Sciences of the Czech Republic Helsinki University of Technology, Finland elisa.schaeffer@tkk.fi SOFSEM
More informationScalable Tensor Factorizations with Incomplete Data
Scalable Tensor Factorizations with Incomplete Data Tamara G. Kolda & Daniel M. Dunlavy Sandia National Labs Evrim Acar Information Technologies Institute, TUBITAK-UEKAE, Turkey Morten Mørup Technical
More informationIntroduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa
Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve
More informationLab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018
Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google
More informationData Mining and Matrices
Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages
More information6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search
6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)
More informationMultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors
MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors Michael K. Ng Centre for Mathematical Imaging and Vision and Department of Mathematics
More informationCross-lingual and temporal Wikipedia analysis
MTA SZTAKI Data Mining and Search Group June 14, 2013 Supported by the EC FET Open project New tools and algorithms for directed network analysis (NADINE No 288956) Table of Contents 1 Link prediction
More informationCom2: Fast Automatic Discovery of Temporal ( Comet ) Communities
Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Miguel Araujo 1,5, Spiros Papadimitriou 2, Stephan Günnemann 1, Christos Faloutsos 1, Prithwish Basu 3, Ananthram Swami 4, Evangelos E.
More informationKronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm
Kronecker Product Approximation with Multiple actor Matrices via the Tensor Product Algorithm King Keung Wu, Yeung Yam, Helen Meng and Mehran Mesbahi Department of Mechanical and Automation Engineering,
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University.
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive
More informationCVPR A New Tensor Algebra - Tutorial. July 26, 2017
CVPR 2017 A New Tensor Algebra - Tutorial Lior Horesh lhoresh@us.ibm.com Misha Kilmer misha.kilmer@tufts.edu July 26, 2017 Outline Motivation Background and notation New t-product and associated algebraic
More informationBEYOND ASSORTATIVITY PROCLIVITY INDEX FOR ATTRIBUTED NETWORKS (PRONE) Reihaneh Rabbany Dhivya Eswaran*
PROCLIVITY INDEX FOR ATTRIBUTED NETWORKS (PRONE) Reihaneh Rabbany rrabbany@andrew.cmu.edu Artur W. Dubrawski awd@andrew.cmu.edu Dhivya Eswaran* deswaran@cs.cmu.edu Christos Faloutsos christos@cs.cmu.edu
More informationTensor Decompositions and Applications
Tamara G. Kolda and Brett W. Bader Part I September 22, 2015 What is tensor? A N-th order tensor is an element of the tensor product of N vector spaces, each of which has its own coordinate system. a =
More informationHAR: Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search
HAR: Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Abstract Xutao Li Michael K. Ng Yunming Ye In this paper, we propose a framework HAR to study the hub and authority scores
More informationNetworks as vectors of their motif frequencies and 2-norm distance as a measure of similarity
Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu
More informationMulti-Way Compressed Sensing for Big Tensor Data
Multi-Way Compressed Sensing for Big Tensor Data Nikos Sidiropoulos Dept. ECE University of Minnesota MIIS, July 1, 2013 Nikos Sidiropoulos Dept. ECE University of Minnesota ()Multi-Way Compressed Sensing
More information1 Searching the World Wide Web
Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on
More informationCS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization
CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,
More informationData Mining and Matrices
Data Mining and Matrices 11 Tensor Applications Rainer Gemulla, Pauli Miettinen July 11, 2013 Outline 1 Some Tensor Decompositions 2 Applications of Tensor Decompositions 3 Wrap-Up 2 / 30 Tucker s many
More informationBig Tensor Data Reduction
Big Tensor Data Reduction Nikos Sidiropoulos Dept. ECE University of Minnesota NSF/ECCS Big Data, 3/21/2013 Nikos Sidiropoulos Dept. ECE University of Minnesota () Big Tensor Data Reduction NSF/ECCS Big
More informationOutline : Multimedia Databases and Data Mining. Indexing - similarity search. Indexing - similarity search. C.
Outline 15-826: Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos Goal: Find similar / interesting things Intro to DB Indexing - similarity search Points Text Time sequences; images
More informationarxiv: v1 [cs.lg] 18 Nov 2018
THE CORE CONSISTENCY OF A COMPRESSED TENSOR Georgios Tsitsikas, Evangelos E. Papalexakis Dept. of Computer Science and Engineering University of California, Riverside arxiv:1811.7428v1 [cs.lg] 18 Nov 18
More informationPV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211
PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211 IIR 18: Latent Semantic Indexing Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,
More informationPage rank computation HPC course project a.y
Page rank computation HPC course project a.y. 2015-16 Compute efficient and scalable Pagerank MPI, Multithreading, SSE 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and
More informationAnalytic Theory of Power Law Graphs
Analytic Theory of Power Law Graphs Jeremy Kepner This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationMalSpot: Multi 2 Malicious Network Behavior Patterns Analysis
MalSpot: Multi 2 Malicious Network Behavior Patterns Analysis Ching-Hao Mao 1, Chung-Jung Wu 1, Evangelos E. Papalexakis 2, Christos Faloutsos 2, and Kuo-Chen Lee 1 1 Institute for Information Industry,
More informationWiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte
Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a
More informationCloud-Based Computational Bio-surveillance Framework for Discovering Emergent Patterns From Big Data
Cloud-Based Computational Bio-surveillance Framework for Discovering Emergent Patterns From Big Data Arvind Ramanathan ramanathana@ornl.gov Computational Data Analytics Group, Computer Science & Engineering
More informationUsing Rank Propagation and Probabilistic Counting for Link-based Spam Detection
Using Rank Propagation and Probabilistic for Link-based Luca Becchetti, Carlos Castillo,Debora Donato, Stefano Leonardi and Ricardo Baeza-Yates 2. Università di Roma La Sapienza Rome, Italy 2. Yahoo! Research
More informationApplications to network analysis: Eigenvector centrality indices Lecture notes
Applications to network analysis: Eigenvector centrality indices Lecture notes Dario Fasino, University of Udine (Italy) Lecture notes for the second part of the course Nonnegative and spectral matrix
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationUsing R for Iterative and Incremental Processing
Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More information<is web> Web Retrieval. Chapter 3: Authority Ranking. ISWeb Information Systems & Semantic Web University of Koblenz Landau. <Nr.
ISWeb Information Systems & Semantic Web University of Koblenz Landau Web Retrieval Chapter 3: Authority Ranking of 20 Chapter 1: Outline Authority Ranking: Motivation Collaboration environments
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationA Note on Google s PageRank
A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to
More informationWafer Pattern Recognition Using Tucker Decomposition
Wafer Pattern Recognition Using Tucker Decomposition Ahmed Wahba, Li-C. Wang, Zheng Zhang UC Santa Barbara Nik Sumikawa NXP Semiconductors Abstract In production test data analytics, it is often that an
More informationThe Canonical Tensor Decomposition and Its Applications to Social Network Analysis
The Canonical Tensor Decomposition and Its Applications to Social Network Analysis Evrim Acar, Tamara G. Kolda and Daniel M. Dunlavy Sandia National Labs Sandia is a multiprogram laboratory operated by
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43
More informationSpectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics
Spectral Graph Theory and You: and Centrality Metrics Jonathan Gootenberg March 11, 2013 1 / 19 Outline of Topics 1 Motivation Basics of Spectral Graph Theory Understanding the characteristic polynomial
More informationThree right directions and three wrong directions for tensor research
Three right directions and three wrong directions for tensor research Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney
More informationZFL, Center of Remote Sensing of Land Surfaces, University of Bonn, Germany. Geographical Institute, University of Bonn, Germany
Communication of Research Results The IMPETUS Atlas H.-P. Thamm 1, M. Judex 1, O.Schultz 2, S.Krüger 1 & M. Christoph 3 1 ZFL, Center of Remote Sensing of Land Surfaces, University of Bonn, Germany 2 Geographical
More informationGoogle Page Rank Project Linear Algebra Summer 2012
Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant
More informationComplex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds
Complex Networks CSYS/MATH 303, Spring, 2011 Prof. Peter Dodds Department of Mathematics & Statistics Center for Complex Systems Vermont Advanced Computing Center University of Vermont Licensed under the
More informationSlide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.
Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering
More informationSPARSE TENSORS DECOMPOSITION SOFTWARE
SPARSE TENSORS DECOMPOSITION SOFTWARE Papa S Diaw, Master s Candidate Dr. Michael W. Berry, Major Professor Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville
More informationGigaTensor: Scaling Tensor Analysis Up By 100 Times - Algorithms and Discoveries
GigaTensor: Scaling Tensor Analysis Up By 100 Times - Algorithms and Discoveries U Kang, Evangelos Papalexakis, Abhay Harpale, Christos Faloutsos Carnegie Mellon University {ukang, epapalex, aharpale,
More informationLecture 7 Mathematics behind Internet Search
CCST907 Hidden Order in Daily Life: A Mathematical Perspective Lecture 7 Mathematics behind Internet Search Dr. S. P. Yung (907A) Dr. Z. Hua (907B) Department of Mathematics, HKU Outline Google is the
More informationParCube: Sparse Parallelizable Tensor Decompositions
ParCube: Sparse Parallelizable Tensor Decompositions Evangelos E. Papalexakis, Christos Faloutsos, and Nicholas D. Sidiropoulos 2 School of Computer Science, Carnegie Mellon University, Pittsburgh, PA,
More informationIndex. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden
Index 1-norm, 15 matrix, 17 vector, 15 2-norm, 15, 59 matrix, 17 vector, 15 3-mode array, 91 absolute error, 15 adjacency matrix, 158 Aitken extrapolation, 157 algebra, multi-linear, 91 all-orthogonality,
More informationCS47300: Web Information Search and Management
CS473: Web Information Search and Management Using Graph Structure for Retrieval Prof. Chris Clifton 24 September 218 Material adapted from slides created by Dr. Rong Jin (formerly Michigan State, now
More information