How to exploit network properties to improve learning in relational domains

Similar documents
Collective classification in large scale networks. Jennifer Neville Departments of Computer Science and Statistics Purdue University

Supporting Statistical Hypothesis Testing Over Graphs

Semi-supervised learning for node classification in networks

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation

Sampling of Attributed Networks from Hierarchical Generative Models

Naïve Bayes classification

STA 4273H: Statistical Machine Learning

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Using Bayesian Network Representations for Effective Sampling from Generative Network Models

Probabilistic Graphical Models

Computational Genomics

Nonparametric Bayesian Methods (Gaussian Processes)

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

High dimensional Ising model selection

STA 4273H: Statistical Machine Learning

Chris Bishop s PRML Ch. 8: Graphical Models

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Dynamic Approaches: The Hidden Markov Model

Graphical Models and Kernel Methods

Using Bayesian Network Representations for Effective Sampling from Generative Network Models

Based on slides by Richard Zemel

Machine Learning (CS 567) Lecture 2

Probabilistic Graphical Models (I)

Bayesian Learning (II)

Probabilistic Graphical Models

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Gaussian Processes (10/16/13)

Recent Advances in Bayesian Inference Techniques

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Variational Inference (11/04/13)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Introduction. Chapter 1

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Quilting Stochastic Kronecker Product Graphs to Generate Multiplicative Attribute Graphs

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Probabilistic Machine Learning. Industrial AI Lab.

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Machine Learning for Data Science (CS4786) Lecture 24

Undirected Graphical Models

Random Field Models for Applications in Computer Vision

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Intelligent Systems (AI-2)

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

STA 4273H: Statistical Machine Learning

Final Exam, Machine Learning, Spring 2009

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

CPSC 340: Machine Learning and Data Mining

13: Variational inference II

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Introduction to Graphical Models

Probability and Information Theory. Sargur N. Srihari

Conditional Independence

Introduction to Bayesian Learning

An Introduction to Bayesian Machine Learning

Machine Learning, Fall 2012 Homework 2

Directed Graphical Models or Bayesian Networks

Introduction to Machine Learning Midterm, Tues April 8

STA 414/2104: Machine Learning

Collaborative topic models: motivations cont

Submodularity in Machine Learning

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Structure Learning in Sequential Data

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Introduction to Machine Learning

11. Learning graphical models

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

ECE521 week 3: 23/26 January 2017

Unsupervised Learning

Material presented. Direct Models for Classification. Agenda. Classification. Classification (2) Classification by machines 6/16/2010.

Unsupervised Learning

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Probabilistic Graphical Models

CS Lecture 4. Markov Random Fields

Least Squares Regression

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Gaussian discriminant analysis Naive Bayes

Least Squares Regression

Ad Placement Strategies

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Bayesian Networks Representation

Probabilistic Time Series Classification

Chapter 16. Structured Probabilistic Models for Deep Learning

CSCI-567: Machine Learning (Spring 2019)

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Content-based Recommendation

11 : Gaussian Graphic Models and Ising Models

Learning Bayesian network : Given structure and completely observed data

Alternative Parameterizations of Markov Networks. Sargur Srihari

Basic Sampling Methods

Modeling and measuring training information in a network. SML 2014 Jan Ramon

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

Conditional Random Field

Transcription:

How to exploit network properties to improve learning in relational domains Jennifer Neville Departments of Computer Science and Statistics Purdue University!!!! (joint work with Brian Gallagher, Timothy La Fond, Sebastian Moreno, Joseph Pfeiffer and Rongjing Xiang)

Relational network classification examples Predict organizational roles from communication patterns Email networks! Predict protein function from interaction patterns Gene/protein networks! Predict paper topics from properties of cited papers Scientific networks! Predict content changes from properties of hyperlinked pages World wide web! Predict personal preferences from characteristics of friends Social networks! Predict group effectiveness from communication patterns Organizational networks!

Network data is: heterogeneous and interdependent, partially observed/labeled, dynamic and/or non-stationary, and often drawn from a single network...thus many traditional ML methods developed for i.i.d. data do not apply

Machine learning 0 Data representation choose Knowledge representation Generic form is: y = " x + " x...+ " 0 choose defines

Machine learning 0 Model space defines combine Objective function L sq (D) = N D X (f(x i ) y i ) i= choose

Machine learning 0 4 (eg. optimization) combine Search algorithm Learning identifies model with max objective function on training data Model is applied for prediction on new data from same distribution

Relational learning Machine learning 0 Data representation Knowledge representation Email networks! Objective function relational data Social networks! Scientific networks! relational models Bn Bn Firm Broker (Bk) Disclosure Branch (Bn) Bn Size 4 Search algorithm Gene/protein networks! Problem In Past Has Business Is Problem On Watchlist Year Bk Region Type Bk Area Bk Layoffs Bn On Watchlist World wide web! Organizational networks! Bn Bk

There has been a great deal of work on templated graphical model representations for relational data RBNs PRMs RMNs IHRMs MLNs DAPER GMNs RDNs Since model representation is also graphical we need to distinguish data networks from model networks

Data network

Gender? Married? Politics? Religion? Data network

Data representation F N D!C F Y D!C Relational F learning M Mtask: N Y N D D C E.g., predict!c political views C based C on user s intrinsic attributes and political views of friends F N C!C M Y C C F Y D C Estimate joint distribution: or conditional distribution: P (Y {X} n,g) Attributed network P (Y i X i, X R, Y R ) Note we often have only a single network for learning

Define structure of graphical model Politics i Politics j Relational template Politics i Gender i Politics i Married i Politics i Religion i

Y i Y j Relational template Y i X i Y i X i Y i X i Model template

Y i Y j Y i X i + Y i X i Y i X i Model template Data network

Knowledge representation X X 8 X X X 8 X 8 X X X Y X 4 X 4 X 4 X 5 X 5 X 5 Y 8 Y X X X Y 4 X 6 X 6 X 6 Y 5 X 7 X 7 X 7 Y Y 6 Y 7 Model network (graphical model)

Objective 4 Search: eg. convex function optimization X X 8 X X X 8 X 8 X X X Y X 4 X 4 X 4 X 5 X 5 X 5 Y 8 Y X X X Y 4 X 6 X 6 X 6 Y 5 X 7 X 7 X 7 Y Y 6 Y 7 Learn model parameters from fully labeled network P (y G x G )= Z(, x G ) T T C C(T (G)) T (x C, y C ; T )

Apply model to make predictions in another network drawn from the same distribution Y i Y j X X X X X X X 8 X 8 X 8 Y i X i Y + Y X 4 X 4 X 4 Y 8 X 5 X 5 X 5 Y i X i X Y 4 X 6 Y 5 X 7 Y i X i X X X 6 X 6 X 7 X 7 Y Y 6 Y 7 Model template Test network

Collective classification uses full joint, rolled out model for inference but labeled nodes impact the final model structure X X X X X X X X X X X X X 8 X 8 X 8 X 8 X 8 X 8 Y Y Y Y X 4 Y 4 X 4 X 4 Y 4 Y 8 Y 8 X 5 X 5 X 5 X 5 X 5 X 5 X X X X X X Y 4 Y 4 Y 4 X 6 X 6 X 6 Y 5 Y X 5 6 X 6 X 7 X 6 X 7 X 7 X 7 X 7 X 7 Y Y Y 6 Y 6 Y 7 Y 7 Labeled node

Collective classification uses full joint, rolled out model for inference but labeled nodes impact the final model structure X X X The structure X X X of rolled-out 8 relational X X 8 X 8 graphical models are determined by the Y Y X Y 4 X structure of the underlying 8 data 5 network, X 4 X 4 X 5 X 5 Labeled node including location + availability of labels X Y 4 X X X 6 X X 7 X this can impact performance 6 7 of Y Y 6 Y learning and inference methods 7 X 6 Y 5 X 7 via representation, objective function, and search algorithm

Networks are much, much larger in practice

Finding : Representation Implicit assumption is that nodes of the same type should be identically distributed but many relational representations cannot ensure this holds for varying graph structures

I.I.D. assumption revisited Current relational models do not impose the same marginal invariance condition that is assumed for IID models, which can impair generalization p(y A x A ) A B E p(y E x E ) C D F p(y A x A ) 6= p(y E x E ) due to varying graph structure Markov relational network representation does not allow us to explicitly specify the form of the marginal probability distributions, thus it is difficult to impose any equality constraints on the marginals

Is there an alternative approach? Goal: Combine the marginal invariance advantages of IID models with the ability to model relational dependence Incorporate node attributes in a general way (similar to IID classifiers) Idea: Apply copulas to combine marginal models with dependence structure F... t t t tn z z z zn t jointly ~ Copula theory: can construct n-dimensional z vector i = F ( of ) i ( arbitrary i (t i )) marginals while preserving the desired dependence structure... zi marginally ~ Fi

Let s start with a reformulation of IID classifiers... General form of probabilistic binary classification: e.g., Logistic regression p(y i = ) = F ( (x i )) Now view F as the CDF of a distribution symmetric around 0 to obtain a latent variable formulation:! z is a continuous variable, capturing random effects that are not present in x p is the corresponding PDF of F z i p(z i = z x i = x) =f(z (x i )) y i = sign(z i ) In IID models, the random effect for each instance is independent, thus can be integrated out When links among instances are observed, the correlations among their class labels can be modeled through dependence among the z s Key question: How to model the dependence among z s while preserving the marginals?? Zj

Copula Latent Markov Network (CLMN) IID classifiers CLMN The CLMN model Sample t from the desired joint dependency:(t,t,...,t n ) Apply marginal transformation to obtain the latent variable z: z i = F ( ) Marginal Φi transforms ti to uniform [0,] r.v. ui Classification: i ( i (t i )) y i = sign(z i ) Quasi-inverse of CDF Fi is used to obtain zi from ui, Attributes moderate corresponding pdf fi

Copula Latent Markov Network (Xiang and N. WSDM ) CLMN implementation Gaussian Markov network Estimation: First, learn marginal model as if instances were IID Next, learn the dependence model conditioned on the marginal model... but GMN has no parameters to learn Logistic regression Inference: Conditional inference in copulas have not previously been considered for largescale networks For efficient inference, we developed a message passing algorithm based on EP

Experimental Results CLMN SocDim RMN LR GMN Key idea: Ensuring that nodes with varying graph Facebook structure have identical marginals improves learning Gene IMDB IMDB

Finding : Search Graph+attribute space is too large to sample thoroughly, but efficient generative graph models can be exploited to search more effectively

How to efficiently generate attributed graph samples from the underlying joint distribution P (X, Y,G)? Space is O( V +V p ) so effective sampling from joint is difficult

Naive sampling approach: Assume independence between graph/attributes P E (X, E E, X )=P E (E E )P (X X ) Attributes Graph Model Attribute Model

Problem with naive approach Original Sampled Although graph structure can be captured by generative graph models, naive pairing with attribute samples does not capture relational correlation 0-0 - 0- Attribute value combinations

Solution: Use graph model to propose edges, but sample conditional on node attribute values P E (X, E E, X )=P E (E X, E, X )P (X E, X ) Attributes Graph Model use Accept-Reject process to sample conditioned on attrs Attribute Model

Exploit efficient generative graph model as proposal distribution to search effectively What to use as acceptance probabilities? Ratio of observed probabilities in original data to sampled probabilities resulting from naive approach!!!!! Original Original This corresponds to rejection sampling Proposing distribution: True distribution: Sampled 0-0 - 0- Attribute value combinations P E (E ij = E ) P o (E ij = f(x i, x j ), E, X ) 0-0 - 0- Attribute value combinations

Attributed graph models (Pfeiffer, La Fond, Moreno, N & Gallagher WWW 4) # Learn attribute and graph model # Generate graph with naive approach # Compute acceptance ratios # Sample attributes! while not enough edges: draw (vi,vj) from Q (the model) U ~ Uniform(0,) if U < A(xi, xj) put (vi, vj) into the edges return edges 0-0 - 0- Attribute value combinations a Possible Edges g b h f e d i c b f h g

Theorem : AGM samples from the joint distribution of edges and attributes P (E ij = f(x i,x j ), E, F )P (x i,x j X ) Corollary : Expected AGM degree equals expected degree of structural graph model

Empirical results on Facebook data 0.4 Correlation Political views AGM preserves characteristics 0. of graph model 0. 0. 0.0 AGM Key idea: Statistical models of graphs can be exploited to improve sampling from full joint P E (E, X E, X ) AGM captures attribute correlation No AGM Facebook AGM-FCL AGM-TCL AGM-KPGM (x) AGM-KPGM (x) FCL TCL KPGM (x) KPGM (x)

Relational learning Data representation 4 Knowledge representation Objective function Search algorithm Representations affect our ability to enforce invariance assumptions Conventional obj. functions do not behave as expected in partially labeled networks (not in this talk) Simpler (graph) models can be used to statistically prune search space

Conclusion Relational models have been shown to significantly improve predictions through the use of joint modeling and collective inference But since the (rolled-out) model structure depends on the structure of the underlying data network we need to understand how the data graph affects model/algorithm characteristics in order to better exploit relational information for learning/prediction A careful consideration of interactions between: data representation, knowledge representation, objective function, and search algorithm will improve our understanding of mechanisms that impact performance and this will form the foundation for improved algorithms & methodology

Thanks to: Alum ni Hoda Eldardiry Rongjing Xiang Chris Mayfield Karthik Nagaraj Umang Sharan Sebastian Moreno Nesreen Ahmed Hyokun Yun Suvidha Kancharla Tao Wang Timothy La Fond Joel Pfeiffer Ellen Lai Pablo Granda Hogun Park

Questions?! neville@cs.purdue.edu www.cs.purdue.edu/~neville