Node similarity and classification

Size: px
Start display at page:

Download "Node similarity and classification"

Transcription

1 Node similarity and classification Davide Mottin, Anton Tsitsulin HassoPlattner Institute Graph Mining course Winter Semester 2017

2 Acknowledgements Some part of this lecture is taken from: Other adapted content is from Social Network Data Analytics (Springer) Ed. Charu Aggarwal, March

3 SimRank Structural context Idea: two objects are similar if they are referenced by similar objects decay factor in [0,1] total # of in-neighbors pairs Avg similarity between in-neighbors of α and in-neighbors of b similarity of inneighbors G(V,E) α b Glen Jeh and Jennifer Widom. SimRank: a measure of structural-context similarity. SIGKDD,

4 SimRank: Intuition G(V,E) G 2 (V 2,E 2 ) Structural context Intuition: Computing SimRank is like propagating on the G " graph of node-node pairs 1. The source of similarity is self-vertices, like (Univ, Univ). 2. Similarity propagates along pair-paths in G ", away from the sources. 4

5 SimRank for Bipartite Graphs G(V,E) A,B Avg similarity between out-neighbors of A and out-neighbors of B music videos music c d c,d Avg similarity between in-neighbors of c and in-neighbors of d [Jeh, Widom 02; Improvements: Antonellis+ 08 SimRank++, C. Li, Han+ 10, Y. Zhang 13, P. Li+ 14 ] 5

6 Lecture road Similarity based Iterative classification Label propagation 6

7 Iterative classification methods Idea: Use features that take into account the neighbor nodes and repeat the classification several time until nothing changes Suppose for each node we have two features: 1. Number of neighbors with class A 2. Number of neighbors without a class [1,1] [1,1] A A Repeat until convergence [1,1] [1,1] A A [1,1] [1,1] A A Learn A Recompute [0,1] Labels and [2,1] [0,1] features on apply to nodes [2,1] [1,0] unlabeled A [2,1] Neville, J. and Jensen, D., Iterative classification in relational data. AAAI. 7

8 Iterative Classification Algorithm (ICA) Train classifier using the labeled instances Until convergence Apply classifier to the unlabeled nodes Updated the feature vectors for unlabeled nodes Return the labels for the labeled nodes Neville, J. and Jensen, D., Iterative classification in relational data. AAAI. 8

9 Extension to multi-labels Each node has a distribution over the labels To avoid noise keep only the top-k labels for each unlabeled node sorted in descending order. Intuition: remove the less confident labels Lu, Q. and Getoor, L., Link-based classification. In ICML. 9

10 Lecture road Similarity based Iterative classification Label propagation 10

11 Guilt-by-Association Techniques????? Given: graph and few labeled nodes Find: class (red/green) for rest nodes Assuming: network effect (homophily/ heterophily) red green Fraudster Accomplice Honest 11

12 Personalized Random Walk with Restarts (RWR) Idea: Propagate labels from a set of nodes to the rest of the graph [Brin+ 98; Haveliwala 03; Tong+ 06; Minkov, Cohen 07] 12

13 PageRank: A kind of random walk John Andrew Bill Laura The importance of John is high if Laura, Andrew, and Bill are also important P_Rank John = -_./ _./01 6/78/ 5-_./01 90:8;< =>?@AB DE FGHIFGJK 13

14 Random Walk In a random walk you do the opposite, you assume that the walker moves randomly and chooses one of the neighbors to visit 1 1/2 Andrew Tom 1/2 John 1. Tom chooses Andrew or Laura with probability 1/2. 2. Once he chooses one it increases the number of times he visited that node 3. Continue the process until nothing changes anymore (at a probabilistic level) 12 Laura John will receive many visits since many nodes are connected to him 14

15 Personalized RWR Tom Now assume that with probability c you perform another move and with probability (1-c) you jump back to Tom Andrew Laura Therefore the probability for the walker of being in Tom place will be Probability of visiting Andrew at time (t-1) Tom t = c Prob Andrew t 1 + Prob Laura t (1 c) Number of Andrew and Laura s neighbors Probability of jumping back to Tom 15

16 Comparing two nodes Start two random walks from the two nodes you want to compare separately Compare the final scores you obtain for each node in the graph using some vector comparison (e.g., cosine similarity, KL-divergence) Tom Andrew Alice John Paul [Tom, Andrew, Alice, John,, Paul] Vector for Tom = [0.2, 0.2, 0.1, 0.3,, 0.01] Vector for John = [0.05, 0.15, 0.2, 0.2,..., 0.2] Compare the two vectors (e.g., subtract) 16

17 Personalized RWR Tom Now assume that with probability c you perform another move and with probability (1-c) you jump back to Tom restart prob [I cd 1 A]x = (1 c)y ½ ½ ½ 0 ½ 0 1 0? graph relevance structure vector starting vector 17

18 Personalized RWR Personalized RWR is defined as the probability of a random surfer of reaching a node n after wandering in the graph for a long time After some time the value for n (and for all the other nodes) does not changes anymore Technically speaking the random walk converges to a stationary distribution Let A be the adjacency matrix of graph G = V, E x = cd Hf Ax + 1 c y where D is a diagonal matrix with the node degree in the diagonal and y is a vector containing the probability of starting from any of the nodes y 3, x is the probability of reaching any node in the graph dim x = dim y = V What is D Hf A? Recall that the inverse of diagonal matrix is a diagonal matrix containing the reciprocal of the elements in the diagonal We need to find x Ix = cd Hf Ax + 1 c y Ix cd Hf Ax = 1 c y (I cd Hf A)x = 1 c y x = I cd Hf A Hf 1 c y 18

19 Another way of looking at the RW x = cd Hf Ax + 1 c y Suppose the surfer starts from the beginning, it will choose one node at random among y In one step he will either choose, with probability (1-c) another starting node or with probability c to move to one neighbor. Why? Think about the process without restart, setting W = D Hf A. x f = Wx r x " = Wx f = WWx r = W " x r x 0 = W 0 x r That means that W 3,o 0 contains the probability of reaching j starting from i in n steps 19

20 Semi-Supervised Learning (SSL) Idea If you have few labeled nodes exploit the structure and the homophily (similarity) between nodes graph-based few labeled nodes edges: similarity between nodes Inference: exploit neighborhood information S T E P 1?? -0.3 S T E P Ji, M., Sun, Y., Danilevsky, M., Han, J. and Gao, J., Graph regularized transductive classification on heterogeneous information networks. ECML/PKDD. 20

21 SSL Equation Laplacian matrix 0.8 [I + a(d A)]x = y ? 1 d graph final homophily strength of structure labels neighbors ~ stiffness of spring d1 d known labels??

22 SSL Equation What does it compute? [I + a(d A)]x = y Hard? Let s unroll it! x = a D A x + y = aax adx + y 0 x 3 = a A 3o x o + (y 3 ad 33 x 3 ) o f Sum the labels from the neighbors Difference between the learned label in the node and the input label 22

23 When is RWR = SSL? [I cd 1 A]x = (1 c)y [I + a(d A)]x = y *all nodes have degree d (1 c) I cd Hf A Hf = I + a D A Hf Hf 1 c I 1 c 1 c DHf A = I + a D A Hf 1 c I 1 c 1 c DHf A = I + a D A 1 c I I 1 c 1 c DHf A = a D A c c I 1 c 1 c DHf A = a D A c 1 c (I DHf A) = a D A c 1 c (I 1 A) = a D A d c D A = ad D A 1 c c d(1 c) = a We assume the graph is d-regular* 23

24 Belief Propagation Iterative message-based method Propagation matrix : ² Homophily class of receiver AI class of sender until st nd round stop criterion fulfilled PL - Pearl, J., Reverend Bayes on inference engines: A distributed hierarchical approach. In AAAI. - Yedidia, J.S., Freeman, W.T. and Weiss, Y., Understanding belief propagation and its generalizations. Exploring artificial intelligence in the new millennium. 24

25 Belief Propagation Iterative message-based method Propagation matrix : ² Homophily class of receiver class of sender ² Heterophily

26 Belief Propagation Equations message(i > j) belief(i) x homophily strength i j 26

27 Belief Propagation Equations belief of i b i (x i ) φ i (x i ) m ij (x i ) prior belief j N(i) messages from neighbors i j 27

28 Fast Belief Propagation Original [Yedidia+]: Belief Propagation m ij (x j ) = φ i (x i ) ψ ij (x i, x j ) m ni (x i ) xi b i (x i ) = η φ i (x i ) m ij (x i ) j N (i) non-linear n N(i)\ j FaBP [Koutra+]: Linearized BP BPis approximated by [I + ad c'a] b h = φ h d1 d2 d3 linear ? prior beliefs Koutra, D., Ke, T.Y., Kang, U., Chau, D.H.P., Pao, H.K.K. and Faloutsos, C., Unifying guilt-by-association approaches: Theorems and fast algorithms. PKDD 28

29 Wait How? m ij (x j ) = φ i (x i ) ψ ij (x i, x j ) m ni (x i ) xi b i (x i ) = η φ i (x i ) m ij (x i ) j N (i) n N(i)\ j [I + ad c'a] b h = φ h [I + ad c'a] b h = φ h Ib ƒ + adb ƒ c Ab = φ ƒ b ƒ = ad c A b + φ ƒ b ƒ i = ad 33 b ƒ (i) c o A 3o b ƒ j + φ ƒ (i) Exchanged messages 29

30 Qualitative Comparison GBA Method Heterophily Scalability Convergence RWR SSL BP? FABP 30

31 Correspondence of Methods RWR SSL BP Random Walk Semi-supervised Belief with Restart Learning Propagation Method Matrix unknown known RWR [I c AD -1 ] x = (1-c)y SSL [I + a(d - A)] x = y FABP [I + a D - c A] b h = φ h ?

32 Questions? 32

33 References Lorrain, F. and White, H.C.. Structural equivalence of individuals in social networks. The Journal of mathematical sociology, Borgatti, S.P. and Everett, M.G., Notions of position in social network analysis. Sociological methodology. Borgatti, S.P. and Everett, M.G., Regular blockmodels of multiway, multimode matrices. Social Networks. Henderson, K., Gallagher, B., Eliassi-Rad, T., Tong, H., Basu, S., Akoglu, L., Koutra, D., Faloutsos, C. and Li, L., Rolx: structural role extraction & mining in large graphs. SIGKDD Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H. and Faloutsos, C., It's who you know: graph mining using recursive structural features. SIGKDD. H. Tong, Y. Koren, & C. Faloutsos. Fast direction-aware proximity for graph mining. SIGKDD, Glen Jeh and Jennifer Widom. SimRank: a measure of structural-context similarity. SIGKDD, 2002 Ji, M., Sun, Y., Danilevsky, M., Han, J. and Gao, J., Graph regularized transductive classification on heterogeneous information networks. ECML/PKDD. Pearl, J., Reverend Bayes on inference engines: A distributed hierarchical approach. In AAAI. Yedidia, J.S., Freeman, W.T. and Weiss, Y., Understanding belief propagation and its generalizations. Exploring artificial intelligence in the new millennium. Koutra, D., Ke, T.Y., Kang, U., Chau, D.H.P., Pao, H.K.K. and Faloutsos, C., Unifying guilt-byassociation approaches: Theorems and fast algorithms. PKDD Neville, J. and Jensen, D., Iterative classification in relational data. AAAI. Lu, Q. and Getoor, L., Link-based classification. In ICML. 33

Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms

Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra 1, Tai-You Ke 2,U.Kang 1, Duen Horng (Polo) Chau 1, Hsing-Kuo Kenneth Pao 2, and Christos Faloutsos 1 1 School of Computer

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

Anomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016

Anomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016 Anomaly Detection Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 and Next week February 7, third round presentations Slides are due by February 6,

More information

Semi-supervised learning for node classification in networks

Semi-supervised learning for node classification in networks Semi-supervised learning for node classification in networks Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Paul Bennett, John Moore, and Joel Pfeiffer)

More information

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

Overlapping Communities

Overlapping Communities Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

THE POWER OF CERTAINTY

THE POWER OF CERTAINTY A DIRICHLET-MULTINOMIAL APPROACH TO BELIEF PROPAGATION Dhivya Eswaran* CMU deswaran@cs.cmu.edu Stephan Guennemann TUM guennemann@in.tum.de Christos Faloutsos CMU christos@cs.cmu.edu PROBLEM MOTIVATION

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Large Graph Mining: Power Tools and a Practitioner s guide

Large Graph Mining: Power Tools and a Practitioner s guide Large Graph Mining: Power Tools and a Practitioner s guide Task 3: Recommendations & proximity Faloutsos, Miller & Tsourakakis CMU KDD'09 Faloutsos, Miller, Tsourakakis P3-1 Outline Introduction Motivation

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

The Linearization of Belief Propagation on Pairwise Markov Random Fields

The Linearization of Belief Propagation on Pairwise Markov Random Fields Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) The Linearization of Belief Propagation on Pairwise Markov Random Fields Wolfgang Gatterbauer Carnegie Mellon University

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton October 30, 2017

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

Online Social Networks and Media. Opinion formation on social networks

Online Social Networks and Media. Opinion formation on social networks Online Social Networks and Media Opinion formation on social networks Diffusion of items So far we have assumed that what is being diffused in the network is some discrete item: E.g., a virus, a product,

More information

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada On Top-k Structural 1 Similarity Search Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China 2014/10/14 Pei

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University. Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Semi-supervised approach which attempts to incorporate partial information from unlabeled data points Semi-supervised approach which attempts

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee wwlee1@uiuc.edu September 28, 2004 Motivation IR on newsgroups is challenging due to lack of

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart

More information

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10 PageRank Ryan Tibshirani 36-462/36-662: Data Mining January 24 2012 Optional reading: ESL 14.10 1 Information retrieval with the web Last time we learned about information retrieval. We learned how to

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

1 Random Walks and Electrical Networks

1 Random Walks and Electrical Networks CME 305: Discrete Mathematics and Algorithms Random Walks and Electrical Networks Random walks are widely used tools in algorithm design and probabilistic analysis and they have numerous applications.

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Integrated Anchor and Social Link Predictions across Social Networks

Integrated Anchor and Social Link Predictions across Social Networks Proceedings of the TwentyFourth International Joint Conference on Artificial Intelligence IJCAI 2015) Integrated Anchor and Social Link Predictions across Social Networks Jiawei Zhang and Philip S. Yu

More information

Machine Learning CPSC 340. Tutorial 12

Machine Learning CPSC 340. Tutorial 12 Machine Learning CPSC 340 Tutorial 12 Random Walk on Graph Page Rank Algorithm Label Propagation on Graph Assume a strongly connected graph G = (V, A) Label Propagation on Graph Assume a strongly connected

More information

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation Pelin Angin Department of Computer Science Purdue University pangin@cs.purdue.edu Jennifer Neville Department of Computer Science

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Collective classification in large scale networks. Jennifer Neville Departments of Computer Science and Statistics Purdue University

Collective classification in large scale networks. Jennifer Neville Departments of Computer Science and Statistics Purdue University Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University The data mining process Network Datadata Knowledge Selection Interpretation

More information

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Semi-Supervised Learning of Speech Sounds

Semi-Supervised Learning of Speech Sounds Aren Jansen Partha Niyogi Department of Computer Science Interspeech 2007 Objectives 1 Present a manifold learning algorithm based on locality preserving projections for semi-supervised phone classification

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

11 The Max-Product Algorithm

11 The Max-Product Algorithm Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced

More information

Link Analysis. Leonid E. Zhukov

Link Analysis. Leonid E. Zhukov Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

0.1 Naive formulation of PageRank

0.1 Naive formulation of PageRank PageRank is a ranking system designed to find the best pages on the web. A webpage is considered good if it is endorsed (i.e. linked to) by other good webpages. The more webpages link to it, and the more

More information

CMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection

CMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Christos Faloutsos CMU Thank you! Prof. Julian McAuley Nicholas Urioste UCSD'16 (c) 2016, C. Faloutsos 2 Roadmap Introduction Motivation Why

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Node Centrality and Ranking on Networks

Node Centrality and Ranking on Networks Node Centrality and Ranking on Networks Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Social

More information

To Randomize or Not To

To Randomize or Not To To Randomize or Not To Randomize: Space Optimal Summaries for Hyperlink Analysis Tamás Sarlós, Eötvös University and Computer and Automation Institute, Hungarian Academy of Sciences Joint work with András

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Computer Vision Group Prof. Daniel Cremers. 14. Clustering Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it

More information

arxiv: v2 [cs.ai] 27 Dec 2016

arxiv: v2 [cs.ai] 27 Dec 2016 The Linearization of Belief Propagation on Pairwise Markov Random Fields Wolfgang Gatterbauer Carnegie Mellon University Pittsburgh, Pennsylvania 523 arxiv:502.04956v2 [cs.ai] 27 Dec 206 Abstract Belief

More information

Computing PageRank using Power Extrapolation

Computing PageRank using Power Extrapolation Computing PageRank using Power Extrapolation Taher Haveliwala, Sepandar Kamvar, Dan Klein, Chris Manning, and Gene Golub Stanford University Abstract. We present a novel technique for speeding up the computation

More information

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Lecture 18 Generalized Belief Propagation and Free Energy Approximations Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and

More information

Inferring Useful Heuristics from the Dynamics of Iterative Relational Classifiers

Inferring Useful Heuristics from the Dynamics of Iterative Relational Classifiers Inferring Useful Heuristics from the Dynamics of Iterative Relational Classifiers Aram Galstyan and Paul R. Cohen USC Information Sciences Institute 4676 Admiralty Way, Suite 11 Marina del Rey, California

More information

Min-Max Message Passing and Local Consistency in Constraint Networks

Min-Max Message Passing and Local Consistency in Constraint Networks Min-Max Message Passing and Local Consistency in Constraint Networks Hong Xu, T. K. Satish Kumar, and Sven Koenig University of Southern California, Los Angeles, CA 90089, USA hongx@usc.edu tkskwork@gmail.com

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please

More information

Page rank computation HPC course project a.y

Page rank computation HPC course project a.y Page rank computation HPC course project a.y. 2015-16 Compute efficient and scalable Pagerank MPI, Multithreading, SSE 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and

More information

Structured deep models: Deep learning on graphs and beyond

Structured deep models: Deep learning on graphs and beyond Structured deep models: Deep learning on graphs and beyond Hidden layer Hidden layer Input Output ReLU ReLU, 25 May 2018 CompBio Seminar, University of Cambridge In collaboration with Ethan Fetaya, Rianne

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering. Guokun Lai 2016/10 Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph

More information

Talk 2: Graph Mining Tools - SVD, ranking, proximity. Christos Faloutsos CMU

Talk 2: Graph Mining Tools - SVD, ranking, proximity. Christos Faloutsos CMU Talk 2: Graph Mining Tools - SVD, ranking, proximity Christos Faloutsos CMU Outline Introduction Motivation Task 1: Node importance Task 2: Recommendations Task 3: Connection sub-graphs Conclusions Lipari

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information

Linear Response for Approximate Inference

Linear Response for Approximate Inference Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering

More information

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged. Web Data Mining PageRank, University of Szeged Why ranking web pages is useful? We are starving for knowledge It earns Google a bunch of money. How? How does the Web looks like? Big strongly connected

More information

BEYOND ASSORTATIVITY PROCLIVITY INDEX FOR ATTRIBUTED NETWORKS (PRONE) Reihaneh Rabbany Dhivya Eswaran*

BEYOND ASSORTATIVITY PROCLIVITY INDEX FOR ATTRIBUTED NETWORKS (PRONE) Reihaneh Rabbany Dhivya Eswaran* PROCLIVITY INDEX FOR ATTRIBUTED NETWORKS (PRONE) Reihaneh Rabbany rrabbany@andrew.cmu.edu Artur W. Dubrawski awd@andrew.cmu.edu Dhivya Eswaran* deswaran@cs.cmu.edu Christos Faloutsos christos@cs.cmu.edu

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS LINK ANALYSIS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link analysis Models

More information

A Probabilistic Embedding Clustering Method for Urban Structure Detection

A Probabilistic Embedding Clustering Method for Urban Structure Detection A Probabilistic Embedding Clustering Method for Urban Structure Detection Xin Lin a,b, Haifeng Li b*, Yan Zhang b, Lei Gao b, Ling Zhao b, Min Deng b a School of Civil Engineering, Central South University,

More information

Kristina Lerman USC Information Sciences Institute

Kristina Lerman USC Information Sciences Institute Rethinking Network Structure Kristina Lerman USC Information Sciences Institute Università della Svizzera Italiana, December 16, 2011 Measuring network structure Central nodes Community structure Strength

More information

Learning distributions and hypothesis testing via social learning

Learning distributions and hypothesis testing via social learning UMich EECS 2015 1 / 48 Learning distributions and hypothesis testing via social learning Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey September 29, 2015

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

Results: MCMC Dancers, q=10, n=500

Results: MCMC Dancers, q=10, n=500 Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time

More information

RANDOM WALKS WITH RESTARTS FOR GRAPH-BASED CLASSIFICATION: TELEPORTATION TUNING AND SAMPLING DESIGN

RANDOM WALKS WITH RESTARTS FOR GRAPH-BASED CLASSIFICATION: TELEPORTATION TUNING AND SAMPLING DESIGN RANDOM WALKS WITH RESTARTS FOR GRAPH-BASED CLASSIFICATION: TELEPORTATION TUNING AND SAMPLING DESIGN Dimitris Berberidis 1,2, Athanasios N. Nikolakopoulos 2 and Georgios B. Giannakis 1,2 1 Dept. of ECE

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson) Link Analysis Web Ranking Documents on the web are first ranked according to their relevance vrs the query Additional ranking methods are needed to cope with huge amount of information Additional ranking

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei COMPSCI 650 Applied Information Theory Apr 5, 2016 Lecture 18 Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei 1 Correcting Errors in Linear Codes Suppose someone is to send

More information

PageRank: Ranking of nodes in graphs

PageRank: Ranking of nodes in graphs PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ October

More information